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The present record of my lectures and conferences 
would not be complete without the acknowledgment of my 
deep gratitude to the organizing committee appointed by 
Ute ta Ye, Woods, the Director or the Graduate School, for 
the kind echt ine extended to me to lecture in that im- 
_ portant centre, and to the audience for the friendly a 
ception and interest offered to my talks, 


l owe a special and very warm indebtedness to 
Dr. W. Edwards Deming, who was kind enough to advise me on 
the topics that would be of interest to the audience, and 
who planned: and thought out all the details of my one week's 
stay in Washington,: which I so thoroughly enjoyed. 


It is a great honour for me to have spoken at the 
Graduate School of the United States Department of Agri- © 
culture, the more so as the audience included many eminent 
statisticians, whose work in various directions I have 
greatly admired, When communicating to them the results 
of my own studies and those of the persons with whom I was, 
or still am, associated, I hoped for occasions to learn 
myself, These were amply provided by the discussions which 
followed the lectures and conferences, The questions put to 
me both before and after my talks, and also the critical 
remarks, were most interesting and suggestive. Frequently 
they referred to some practical or theoretical difficulties 
encountered in the statistical work carried out on such an 
imposing scale in the United States, Some of these problems 
were quite new to me and I was not able to offer any reply 
to many a question asked, Later on, however, I managed to 
produce some of the answers required and these, I hope, will 
soon be published elsewhere. Other questions and remarks 
suggested that some points in my lectures were not sufficiently 
clearly presented, Consequently, I tried to introduce the 
necessary amendments to make things clearer, and the present 
draft differs in places from what I actually said. In this 
particular respect I owe very much to Dr, Deming whose in- 
quisitiveness and friendly criticism helped much in improving 
clearness and accuracy of presentation. 


It is a pleasure to record a similar indebtedness to 
Mr, Milton Friedman, Dr, Charles F, Sarle, Mr, Frederick F, 
Stephan, Dr, Sidney Wilcox, and others, who were kind enough 
to lend me their attention and help. 














It is again to Dr.-Deming that I am most grateful 
for his idea of publishing the present book and for having 
taken infinite trouble in producing it in the excellent 
form in which it actually appears, 


I must add one remark concerning the contents of the 
lectures and conferences as they appear in the present” publi- 
cation, It will be seen that some of them are concerned 
with pure theory, others with various applications: problems 
of plant breeding,‘ those of randomized and systematic arrange- 
ments of agricultural trials, of sampling human populations, 
and of time series analysis, , Needless to say, the audiences 
varied considerably from one Lecture or conference to another, 
Consequently, when speaking on applications requiring a refer- 
ence to certain details of the ‘theory, I did not hesitate to 
mention them:at some length, even if I had already had the 
occasion of discussing the same point at a previous lecture 
or conference, This was necessary because many of the listen- 
ers of one conference did not attend the others, in drafting 
the present publication, we have thought it best to avoid 
some of the repetitions that actually occurred, However, it 
was thought wise not to omit them altogether, since, if that 
had been done, each of the conferences dealing with applica- 
tions to practi tal, problems would not be a closed unit. in 
itself, 


‘J, Neyman. 


The Cell, Little Hampden, 
Great Missenden, _ 
Buckinghamshire, 

28 August 1937 


FOREWORD FROM THE EDITOR 


It may not be out of place to recall that the relation 
between editor and author is different from that between 
co-authors, An editor is responsible for clarity, cross- 
references, citations to literature, proof-reading, and 
general appearance, but save for notes actually signed "editor," 
he is not responsible for the actual content of the material, 
however extensively he may have revised it and contributed to 
it, On the other hand, co-authors are jointly accountable for 
every portion of an article to which their names are attached, 


Anticipating that the statistical methods developed in 
this book will have considerable theoretical and economic value 
in agriculture, Dr. A. F, Woods, Director of the Graduate School, 
and Dr, ©. H. Kunsman, Chief of the Fertilizer Research Division 
of the Bureau of Chemistry and Soils, have put the facilities of 
their offices at the disposal of the editor, including whatever 
portion of his time could be spared from other pursuits, 


The editing of this book for Dr. Neyman has been a 
pleasant task, made so by the enthusiastic assistance of many 
friends; it could hardly have been produced except for a 
fortunate chain of additions of their efforts, Dr. Neyman 
himself devoted many days of the summer of 1937 to revising 
the edited record of the lectures and conferences originally 
delivered in Washington, and has then and since been exceeding- 
ly patient in dealing with suggestions from the editor, It is 
also a pleasure to record in particular the valued assistance 
of Messrs, B. R, Stauber, Alexander Sturges, Milton Friedman, 

W. Allen Wallis, Otis A, Pope, and Frederick F, Stephan, in 
various matters. The original recording of the conferences was 
done by Miss Helen Evans: if more expert reporters there be, 

the editor would shun the assignment of finding one, The typing 
is mostly the work of Mr, Stanley J, Magdurakas, who unacquainted 
with mathematics, was picking up editor's slips before the job 
was finished, 


The primary reason for inserting this foreword was to 
provide space for apologies for flaws, For example, in order 
to make use of certain previously published graphs, some in- 
consistencies and infelicities in nomenclature were allowed to 
stand. Unfortunately the typewriter was not properly equipped 
with small figures (5,1,2,3;) for exponents and subscripts until 
most of the stencils had been cut. With more time, these and 
other imperfections could have been adjusted, but in view of the 
pressure for the appearance of the work, it will not be further 
delayed, 


W. Edwards Deming 
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THE MODERN VIEWPOINT ON THE CLASSICAL THEORY OF PROBABILITY 
AND ITS APPLICATIONS, ‘TESTS FOR STATISTICAL HYPOTHESES. 


Three voted igeeda at the 
i Graduate School of .the U.S, ee of ao 


by 
J, Neyman - 


INTRODUCTION 


Since the original titles of my Lectures were fixed, I have reé- 
ceived a number of letters from the members of the prospective audience, 
and those letters forced me to modify the original programme. 


The conception of probability has been discussed and defined in 
many different ways, each having its. own advantages. It must be em- 
phasized that although the respective theories frequéntly contradict 
each other, this does not neces sarily mean that some of them are wrong. 
Any theory is correct so long as the axioms on which it” is ‘based are not 
mutually contradictory and there are no errors in deductions. Among the 
existing systems of axioms and theories deducible from them we may make 
a choice,. In this we shall naturally be guided by considerations of 
usefulness or, what frequently amounts to the same, by.our personal taste, 
ro LS importin’, however, to make it clear in what theory one is working 
Otherwise unnecessary sh beret anatny may arise, 


In my first. lecture I shall describe the basic ideas of the theory 
of probability that I prefer to others, and which I have always had in 
mind when working on the theories of testing statistical hypotheses and 
of estimation, sd 


So far as I am aware these views of mine are shared by E. S. Pearson 
and other workers attached to the Department of Statistics at University 
College, London. It may be, therefore, that the present lectures will 
help to understand the whole of the work carried on. in that centre, 


It would be useless, of. course, to try to develop the entire theory 
of probability during two or three lectures only.. Therefore I shall con- 
centrate on the general ideas, definitions, ete, Details of. the theory 
of probability treated from the seme point of view, though perhaps using. 
a@ifferent wordings, may be found in various books and papers, of which I 
shall ment ion the following:. 


1. ‘Hy Cramer: Rendom variables and probability distributions, 
-Cambridge, 1937. 


2. M. Fréchet: Recherches théoriques modernes sur le théorie des 
probabilités, Gauthier-Viltars , Periasy 29074) .~ 


poe 


3. A. Kolmezgoroff: mds leper der Wahrscheinlichkeitsrechnung. 
intend Springer, miccmancs. 4 ‘1955, 


The second lecture will be given enbisety to the question of the 
possibility of applying, the mathematical theory of probability to 
practical problems. ‘The ideas developed here are what have grown out of 
reading such writers as E, Borel, ‘L. v. Bortkiewiez, Karl Pearson, and 
undoubtedly others, but it is difficult to give exact quotations, 


In the third and last lecture I shall deal with a somewhat narrow- 
er but still rather broad question of what is the meaning of a test of a4 
statistical hypothesis and what are the grounds for choosing between 
several alternative tests, Material for that lecture is essentially 


-' taken from an article of.mine, published in 1929 in the Reports of the 


First Congress of Slavonic Mathematicians‘in Warsaw. Its:title is 
"Méthodes nouvelles de verification des hypothéses statistiques.” 





LE LECTURE I: ON THE, THEORY OF PROBABILITY ' 


1. Definition of Probebility. The probability-that I shall de- 
fine will always relate to an object of a specified kind, say A, having ~ 
a certain property, say B. Thus we may speak of the probability of a ball 
having the property of being black, of a person 36 years of age "having 
the property" of dying. during. the next twelve months, etc. It has been 
usual to define probability referring either to events or to propo- 
sitions, Obviously the choice is very much a matter of convenience and 
it seems to me that. speaking of the probabilities of objects having cer- 
tain properties is convenient. Besides, it will be noticed that assuming 
this nomenclature we may speak also of probabilities of events, These 
will mean the probabilities of events having the property of actually 
oceurring. Also it will be possible to speak of probabilities of prop- 
ositions, which will mean the probabilities of propositions having the 
property of being true. The assumed system of expressions therefore 
seems to be not less general than the others, 


In mathematical definitions the actual wordings used do not matter 
very much, However they do have some importance, as they may appeal to 
intuition with different strengths and may differently emphasize the 
essential source of the concepts introduced, The essential point in the 
conception of the probability I am going to use is that it will always 
refer to a specified set. of objects, which I shall describe as the funda- 
mental probability set. This point is emphasized in the wording adopted, 
since we agree to speak of the probability of a specified object A having 
a property B, It will be noticed that the process of specifying the 
object A is equivalent to specifying or perhaps even enumerating all 
objects that are "A" in distinction from others that are.not. Now all 
objects A will form what I shall call the fundamental probability set 


. 
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(F,P.S. for short). This will be denoted also by (A).* 


It is obvious that in order for one to be able to @numerate all 
objects A, those’ objects must be well’ defined by a specification of one 
or more propert te distinguishing the objects A from al? others,:* This 
property will also be denoted by the same letter A, 


Before proceeding any further I shall explain the terms logical 
Sum and logical product of two or more properties, . Let B, and B,. be 

any two properties, The property B, is a logical sum (or sum fox short) 
of B, and B, if it consists in-our object possessing at-least one of the 
properties B, and B,, and for this sum we shall write B, = B, + By. It 
will be convenient to use an expression like "an object -B, + By” to 
denote an object possessing the property B, + B,, ete) 


A property Bg will be called a logical product (or simply product 
“for short) of the properties B, and B, if it consists in an object pos- 
sessing both B, and B,.. We shall accept the notation Bg = B,B, and use 
the expression "an object B,3B," to denote an object possessing. the 
property B,B.. : 


The above definitions are imesh Bele Ey extended to the sum and 
product of any number of properties, 


Turning now to the definition of probability of an object A pos- 
Sessing the property -B, I want to emphasize that it requires the enumer- 
ation of all the objects A actually possessinz the property B, i.e. all 
the objects possessing the property AB. According to the conventions 
already established, the set of those will be denoted by (AB). 


Up to the present time our considerations have been perfectly. 
general, Owing to the fact that the mathematical theory of sets is not 
commonly known, further steps leading’to the definition of probability 
will have to. be discussed twice, once on the assumption that the funda- 
mental probability set (A) is finite and next, that it is anything, 
finite or infinite, 


Suppose that the fundamental probability set (A) is finite, and 
denote by n the number of objects it contains, Further, let k be the 
number of objects belonging to (A) and having the property B. The . 
probability of an object A having the property B will,be defined as the 
ratio k/n, and will be denoted by 


(2 
fA) - 
In other words, the probability of an object A having the property 


— meme ee 


. P{BIA} = k/n (1) 


* Any letter,.¢.g. & in parenthesis stands for "all x," ‘This nota- 
tion is commonly in use, 


Ph Hen 


B is defined as the proportion of objects A having the property B,° The 
‘expression “the probability of an object A having a property BY is, ee 

- course, somewhat lengthy; we shall therefore use some abbreviations such 
as nthe probability of B," but it is necessary to remember the full mean- 
ing of these words, 


boy - Whenever there will be no danger of misunderstanding, the above 

~ notation can be simplified, .For instance, if the probabilities that are 
calculated in the course of solving a merhads problem refer always to 
the same fundamental probability set (A), the A may be omitted in the 
symbol of probability, whereupon P{B} wilt suffice for P{BlA}. Some- 
times, however, we shall have to deal not only with one fundamental 
probability (A), but also with one or more others, each forming a part 
of (A). For instance, besides dealing with the probability of an object 
A having a certain property B', we might deal also with the probability 
of an object AB having the same property B', (or some other), In such 
cases the probabilities referring to objects A may be written without 
specifying their set, while probabilities referring to objeets AB may 
not be; thus, P{B! | AB} may be shortened to P{B' |B}, and P{Bt {A} may be 
shortened to ‘p{B'}, 








Tt is most important to distinguish the probabilities P{B' {a} 
and P{Bt | AB}. The former is the proportion of all objects A having the 
property B', while the latter is the proportion of the objects having 
the property AB and in addition the property B'. Special care in dis- 
tinguishing those ‘two concepts is needed when we use shorter expressions 
and notations, 


In order to emphasize this distinction we shall sometimes describe 
P{B' [a} as an absolute probability of Bt and the probability P {B' | AB} 
as the relative probability of B' given B, The relative probability of 
B' given B may or may not be equal to the absolute probability of B', 
If it is, then we say that the property B' is independent of B, 


It will be noticed that the definition of the probability applies 
only to cases where the fundamental probability set is not empty, that is 
to say, only when it contains at least one element. Otherwise the word 
probability would have no meaning It follows that whenever we speak of 
a probability, we imply that the fuahonen tat probability set is not empty. 


It follows from the definition that the probability P of any prop- 
erty, E, is a fraction between zero and unity. If P = 0, none of the 
elements of the F.P.S,. has the property E, In this case we can con- 
veniently describe E as an impossible property. If on the other hand 
P= 1, it follows that the property E belongs to each of the elements of the 
F.P.S, and it (the property E) may be described as the only possible 
property, It is easily seen that the reverses are true, namely that if 
E, and Ey, are an impossible and ne only possible property ‘respectively, 
thet tun pi{ei} = 0 end Piz,} = It will be noticed that the relative 
probability P{Bt |B} of Bt given a "toa a definite meaning only if B is 





not an impossible property. 

The characteristic feature of the above definition of probability 
is (i) that it refers to sets of objects and (ii) that it does not in- 
volve any reference to "equally probable" cases, In order to emphasize 
the consequences of the definition I shall discuss a few examples, 


Example 1. A die has six faces, one and only one of which has 
six points on it, The probability of a side of the die having six points 
on it will be, according to our definition, always 1/6. No experiments 
with die casting are able to alter this conclusion, 


Example 2, The probability of a side of the die having six 
points on it must be distinguished from the probability of getting six 
points on the die when casting, 


Reading this last sentence once more and comparing it with the 
definition of probability, (Eq. 1), one will easily see that without 
further description of the situation, the definition of probability 
could not be applied to castings, Speaking of "the probability of getting 
six points on the upper side of a die when casting" and trying to apply 
the definition of probability we may think of various things. 


(a) We may think of a set of 100 castings already carried out, 
Then there will be no difficulty of calculating the probability reauired, 


(b) We may think of a set of some 100 future castings. In that 
case the probability required, say P{six} will be simply unknown, To 
establish its value, we should carry out the castings and count the 
cases with “sixt, 


(c) Finally we may have in mind some hypothetical series of cast- 
ings and discuss various probabilities referring to it. Usually such 
discussions consist in deducing values of one or more probabilities from 
the assumed hypothetical values of some others. Some examples of Such 
discussions will be found later, 


Of the three ways of interpreting the ambiguously stated problem 
concerning the probability of getting "six" on a die when casting, the 
last is the most fruitful. We shall see this a little further when I 
shall speak of the so-called empirical law of big numbers, 


Example 3, Consider the familiar expansion nm = 3,14159... and 
denote by Xi0900 its thousandth decimal, What is the probability 


PVE. doa = 5} of its being equal to 5? Here the question is not ambiguous 
and the answer is immediately found: the value of the probability . 

ae eee = 5} is actually unknown, but it is certainly either zero or 
unity, In fact, there is but one object satisfying the definition of 
Xicooe Therefore the fundamental probability set consists of one ele- 
ment only and the denominator in the expression (1) serving as the 
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definition of probability is equal to unity, The numerator may be equal 
to unity--this if x1,909 is actually equal to 5--or to zero, if X4000 Fa De 
As the decimals in the expansion of tm are known only to 707 places, Xi000 
is unknown and therefore we do not. know whether Phe Sk = Ss is zero or 
unity. y 


As I have mentioned before, the probabilities may refer to some 
hypothetical probability sets, with assumed properties, This case is the 
one that the theory is most often concerned with;.-and .it-is of extreme 
importance, Therefore I shall give two illustrations, | 


Example ‘4, Consider a set F, of n die castings, and denote by 


F, the set of $n(n-1 1) different pairs that may be formed out of them, no 
element to be repeated-in a pair. If certain propertiés ° of the set Fy 
are given we may calculate the probability, say P{six, six|F, ie of a 
pair of castings with two "sixes," referring it to F, as the Flibsoy fee 
property of F, that is needed for the calculation of "Plate, sixl¥, } 
consists in the probability Pisin ia of getting.a six in one casting. 
Assume, for sabi 0: that 


P{six|Fi} = 16 F 42) 


This would mean that among the n castings in F, there are exactly n/6 
with six on the top face of the die, from which we could conclude that 
‘among the sn(n-1) pairs of castings forming F, there are ‘exactly 


| eae (2 2-1) = n(n-6)/72 ne (3) 


such pairs that consist of two "sixes", and therefore that the probabil- 
ity ; 


P{six, six|F,} = (n-6)/36(n-1) a 


It will be seen that the above result is purely hypothetical: if 
the connection between F, and F, is as described above, and if the proba- 
bility of a specified property ("six") calculated with regard to F, is 
1/6, then the probability P{six, six|F,} = (n-6)/36(n=1) . Thus, . if 
the probability set’ Fy; has the properties as specified in the conditions 
of the problem, then the formula (4) holds good. We may notice at this 
stage that the properties of a probability set F, that are relevant for 
the calculation of probabilities may be given indirectly by specifying 
certain properties of some other set F,. (or of many other such sets), 
and by describing the connection between F, and F,. A similar position 
prevails also in the following example, . Wath 





Example 5. . Consider a series of n hypothetical experiments 
and assume that each of. these experiments results either in an event E 
or in a failure to produce E, described as non-E, Assume further that 





a separate probability set, consisting of the same number m of elements 
each, is connected with each of the experiments; and denote by Fy the 
set corresponding to the ith experiment, i=1, 2, ..., n. Suppose that 
whatever be i, the probability of the event E chToutaved with regard to 
B, is the same p, that is, 


P{g|Fy} =p (5) 


We may now consider still another probability set, say F,, the 
elements of which are all possible combinations of the elements of the 
sets FP), Fo, ..., F, taken n at a time, each element selected from a 

different set, If each of the sets F,,°F., ;.., F, consists of the same 
number m of elements, then the set’ F, will consist of m"™ elements, 


The assumed properties of the sets Ry 224 «93 Pm and their con- 

nection with F, permit the calculation of various probabilities referring 
ewe a. FOr inetaaas we may calculate the probability, say Pn, ke Which is 
frequently picturesquely described as that of getting the event E exactly 
k times in the course of n independent trials in which the probability 
of H is permanently equal to p. This probability is easy to calculate 
and is known to be equal to : 

De wah 

Paik * "kr (nok)! * (1-p)” (6) 
But it is important to know what this formula denotes, It is no more 
and no less than the proportion of elements of the set Ff, that have the 
desired property, consisting of k “events” E and n-k "events" non-H, 


As mentioned in this example, again the calculation of the prob 
ability Pn, k referring to the probability set F, was based on the prob-= 
abilities referring tothe sets Fy Fe, tee, FT, “and on the structure of 
the elements of Fy, each of them being composed of those of Hy, Hoy soos Fh. 


This is a typical situation and it will be convenient. to intro- 
duce special terminology for its description, If the elements of any 
probability set F, are some combinations of those of some other sets 
Fi) Fe ,etc., ie | we shall say that the set F, is of a higher order than 
the sets Fi, #5, ... . Thus we may distinguish probability sets of 
first, second, third, etc, orders, 


In example 4 the set F, was of first, and the set Fo of second 
order, In example 5 the sets Fj, Fp, ,.. , Fy were of first order and 
the set F, of the second, It is easy to construct examples in which 
there will be probability sets of three or more successive orders, 


In what I have just said I used the expressions "experiments" 
"results", "events", which were not directly involved in the definition 
of probability. I want to emphasize that these expressions are no more 
than a picturesque description of fundamental probability sets and, if 
purity of language were demanded, they should really not be used, 


“ 


Oe wen 


However, these and similar expressions are very frequent in all works on 
probability., They were established in olden days when the point of view 
regarding probability. theory was somewhat different. We hold on to them 


now: because of their convenience, This point will be discussed later 
‘when I shall. speak of applications and of the law of big numbers. 


We shall notice now that a description of dn experiment as in 
the examples above amounts really to a description of a probability set. 
As those were classified, so will be classified the corresponding hypo- 
thetical experiments. Therefore we shall speak of oxgeranents of the 
First. idinilalip chitin ee, Onde. 


In order to clear away any fonsibic misunderstanding let us con- 
sider again the probability sets, involved in the last two examples, and 
illustrate them graphically. The set F, of example 4 may be represented 
by the use of the letter s,for a.six, and the letter r for a not-six, 
With n = 12 we a a have the following picture: 


. : 
cr c \~ 
“TR 


Cet sor tn ate r--r--Tr-- ie ~ - S- - Po~,~ fees eae 
ae 3 4 5 6 Y 5 9, 05 


The numbers 1 to 12 below the line represent the ordinal numbers of the 
elements ‘of Fy. ' 

To represent F, diagrammatically it will be convenient to use two 
dimensions, - Each element of F, is represented by rr, rs, sr, or ss, the 
rectangular coordinates x and y of which are equal to the ordinal numbers 
of the two elements of F, making up one element of F,. As x can never be 
equal to y, i.e., no element of F, is to be repeated, it is permissible 
to take'x > y, There will be only one element: of F, possessing the 
property "six-six"(ss), that composed of the eleventh and twelfth ele- 
ments of F,. It may be,seen from the upper chart on the next page that 
the number of elements forming F, is 66 and that therefore 
Pisix, six|F, } = 1/66, which agrees Poreguty. with formula (4) above, 
if n therein be set dic to. 12. ' 

We may now illustrate the connection between the probability sets 
Fo and F,, Fy, \.es, En Of example S,°° Let us put kK =n = 2) one Gy 
p = 1/6, so that among the six elements forming either Fy ‘ert Be there 
will be only one possessing the, property. E, the other five pine non-E, 
denoted by G. Let E in both sets be the 6th element. Any element of 
F, is formed by combining an element of F, with some element of F,. 


Therefore it will be convenient to represent Fo by points on a plane, 


of which the codrdinates x and y are equal. to the ordinal numbers of the 
elements of F, and F,, the combination of which produces the element of 
Fo under consideration (see the lower figure on the next page). All 

the elements of Fo possess the required property of being composed of 
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elements of F, and F,, but only one of the 36 is EE, The resulting 
probability P,,, = 1/36 is in agreement with the binomial formula (6). 


I hope that it is not necessary to insist that the above results, 
namely 


P{six, six|F,} = 1/66 (Ex.4) | (7) 
and apne’ P{six, six|F,} = 1/36 (Ex.5) (8) 


do not represent any sort of paradox, Both probabilities are calculated 
correctly and they differ only because they refer to different probabil- 
ity sets, Fy and Foe This emphasizes the fact that the probabilities 
refer to probability sets and that the failure to specify them properly 
may, amd usually does cause misunderstandings. 





2. More general definition of probability. The above definitions 
and examples are probably sufficient to explain the basic ideas underlying 
the theory of probability when the ‘fundamental probability set is finite, 
Let us now turn to the more general case and assume that the F.P.S., say 
tA), is anything, finite or infinite, As formerly let us denote by (B) 
the set of elements of (A) that have some distinctive property B, 


The definition of probability I am going to give will apply only 
to certain sets (A) and to certain properties B, not all possible. In 
fact we shall require that the following postuletas.ghould be satisfied 
by the class of subsets (B) of A which correspond to the properties B 
for which the probability will be defined. This class will be denoted 
by ((B)). 


It will be assumed 


(1) that the class ((B)) includes (A} so that (A) is an 
element of ((B)). . . 

(2) that for the class ((B)) it is possible to define a 
single valued function m(B), called the measure of (B), wherefore the 
sets (B) belonging to. the class ((B)) will be called measurable, The 
assumed properties of the measure are as follows: 


(a) Whatever be (B) of the class ((B)), m(B) > 0. 
(b) If (B) is empty (does not contain any single 
element) then it is measurable and m(B) = 0. 
(c) The measure of (A) is greater than zero, 
(ad) If (BL), (B,), ..., (B.) is any at most denumer- 
“oo able set of measurable subsets, then their sum, 
(2B;), is also measurable, If the subsets of no 
two “ehire (B,) and (Bs) (where i # j) have common * 
elements, then m(Bj)"= 4 m(B;). 
(e) If (B) is measurable, then the set (B) of objects | 
A not. possessing the property B is also measurable | 
and consequently, owing to (ad), m(B) + m(B) = m(A). 


Under the above conditions the probability, ar) of an* object. 
A having the property B will be defined as the ratio P Bla} = mB ALA y 


oo Tes 


The probability P{B|a}, or P{B} for short, may be called the absolute 
probability of the property B, Denote by B,B, the property of A con- 
sisting in the presence of both B, and B,. It is easy to show that if 
(B,) and (B,) are both measurable then (B,B,) will be measurable also, 
If m(B,) > 0 then the ratio, say P{B,|B,} = m(B,B,)/m(B,) will be call- 
ed the relative probability of B, given B,, This definition of the 
relative probability applies when the measure m(B,) as defined for the 
fundamental probability set (A) is not equal to zero, If, however, 
m(B,) = 0, but we are able to define some other measure, say m', appli- 
“see to (B,) and to a class of its subsets including (B,B,) such that 
m'(B a kp es vetiake abv oe of B, given B, will be defined 
as P\B, ae o) /m' ( Whatever may be the case we shall have 


ae = ae P{B,|B,} = B{B,} P{B,|B,} (9) 


It is easy to see that if the fundamental probability set is 
finite, then the number of elements in any of its subsets will satisfy 
the definition of the measure, On the other hand, if (A) is the set of 
points filling up a certain region in n-dimensional space, then the 
measure of Lebesgue will satisfy the definition used here, 


If the objects A are not actually points (e.g. if they are certain 
lines, etc.) the above definitions of probability may be again applied, 
provided it is possible to establish a one to one correspondence between 
the objects A and other objects A', forming a class of sets where the 
measure has already been defined, If (B) is any subset of (A) and (B") 
the corresponding subset of (A'), then the measure of (B) may be defined 
as being equal to that of (B'). It is known that a similar definition 
of measure of subsets of (A) could be done in more than one way. Such 
is for instance the case when the objects A are the chords in a circle 
C of radius r and the property B consists of their length 2h exceeding 
some specified value 2K, It may be useful to consider two of the possible 
ways of treating this problem (Bertrand's problem), 


1. Denote by x the angle between any fixed direction and the 
radius perpendicular to any given chord A, in a circle of radius r, 
Further, let y be the perpendicular distance of the chord A from the 
centre of the circle C, Now let A' denote a point on the xy plane with 
coordinates x and y; then there will be a one to one correspondence 
between the chords (A) of length 0 < 2h< 2r and the points of a 
rectangle, say (A'), defined by the inequalities 0 < <2 and 0 <7, 2 
(See the upper figure on the next page). The measure of the set of 
chords with lengths exceeding 2K < 2r could be defined as equal to the 
area of that part of (A') where 0 < y® < (r® - K#), /It follows that the 
moves lity in which we are interested is P{n >K} =v[l1 - (K/r)®]. 


2. Denote by x and y the angles between a fixed direction and 
the radii connecting the ends of any given chord A, If A" denotes a 
point on a plane with coordinates x and y, then there will be a one to one 


2O5 hy 


correspondence between the chords of the system (A) and the points 
within the parallelogram (A") determined by the inequalities 0< x< 2n, 
andx<cy<x+mforO<x<mandxcy<xt+n form<x< ean, (See 
the lower figure of this page) . The measure of the set of chords with 
their lengths exceeding 2K may be defined as being equal to the area of 
that part of (A") where x + 2 are sin K/r< y<x+m, Starting with 
this definition, then P{h > K} = 1 - (2/m) are sin K/r, 







Length 2h 


Solution 1. Here the rectangle (A') 
is the. measure of the set of chords 
(A) of the circle. 
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Solution 2. Here the parallelogram (A") is the measure of the 
set of chords (A) of the circle. 


—————— 
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It is seen that the two solutions differ and it may be asked 
which of them is correct. The answer is that both are correct, but that 
they correspond to different conditions of the problem, In fact the 
question "what is the probability of a chord having its length greater 
than 2K" does not specify the problem entirely, This is only deter- 
mined when we define the méasure appropriate to the set (A) and its sub- 
‘sets to be considered, We may describe this also differently using the 
_terms "random experiments" and "their results," We may say that to have 
the problem of probability determined it is necessary to define the 
method by which the randomness of an experiment is attained. Describing 
the conditions of the problem concerning the. length of a chord leading 
to the lst solution (upper figure on the preceding page), we could say that 
when selecting at random a chord A, we first pick up at random the direction 
of a radius, all directions being exueniy probable, and then, equally at 
random we select the distance between the centre of the circle and the 
chord, all values between zero and r being equally probable, It is easy 
to see what would be the description in the same language of the random 
experiment leading to the 2d solution (lower figure on the preceding 
page) ° i 


We frequently use this way of > iain: but ait is ne ecessary. to 
remember that behind such words, as e.g. "picking up at random a di- 
rection, all of them being equally probable," there is a definition of 
the measure appropriate to the fundamental probability set and its sub- 
sets, I want to emphasize that in all my writings the sentence like the 
one in quotation marks, just written, is no more than’a way of describing 
the fundamental probability set and the appropriate measure. The con- 
ception of “equally probable" is not in any way involved in the definition 
of probability adopted and it is a pure convention that the statement 
"In picking up at random | 
a chord, we first select 


"For the purpose of cal- 
culating the probabilities 


a direction, all direc- Means no concerning chords in 4 
tions being equally prob- circle, the measure of any 
able; and then we choose §! more and - set (A) of chords is de- 


fined as that of.the set 
(At) of points, each with 
coordinates x and y and 
such that for any chord A 
in (A), x is the direction 
of.the radius perpendicular 


i 
| 
| 
| 
a distance between the : 
‘7 
| 
ai to A and y the distance of 


centre of the circle and 
the chord, all values of 
the distance between zero 
and r being equally prob- 
able," 


no less 


a3 


than | 


A from the centre of the 
circle, (A) is measurable 
only if (A') is’ so," 


However free we are in mathematical work to usé words that we find 
convenient so long as they are clearly defined, our.choice must be justi- 
fied in one way or another, The justification of the way of speaking 
about the definition’of the measure within the fundamental probability 
set in terms of imaginary random experiments, lies in the empirical fact 


which Bortk <Letwicz* insisted calling the "Law ots bia numbers", This. law 
says that given a purely mathematical: definition of a probability set 
including the appropriate measure, we ere able to construct a real ex- 
periment, possible to cerry out in any. laboratory, with a certain range 
of possible results and such that if it is repeated many times, the 
relativefrequencies oft these results and their different combinations in 
small series approach closely the values of probabilities as calculated 
from the definition of the fundamental probability set. Examples of 
such real random experiments ef PROM AD by the experience of roulette*, 
by the experiment with throwing a néedle **so as to obtain an analogy to 
the problem of Buffon, and by various ay ae experiments based on 
-Tippett's random numbers*™, 


These examples show that the random experiments corresponding in 
the sense described to mathematically defined probability sets are possi- 
ble, However, frequently they are technically difficult. E.g. if we 
take any coin and toss it many times, it is very probable that the fre- 
quency of heads will not approach Se To get this result we must select 
what could be called a well balanced coin and we have to work out an 
appropriate method of tossing. Whenever we succeed in arramging the 
technique of a random experiment, such that the relative frequencies of 
its different results in long series sufficiently approach, in our 
opinion, the probabilities calculated from a fundamental probability set 
(A), we shall say that the set adequately LAPE gen eA: the method of carry- 
ing out the experiment.+ 


We shall now draw a few obvious but important conclusions from 
the definition of the probability adopted, 


(1) %If the fundamental probability set consists of only one 
element, any probability calculated with regard to this set must have 
the value either zero or unity. 


(2) If all the elements of the fundamental probability set (A) 
possess a certain property B,, then the absolute probability of B,, 


- 


See ee alee ae ee 


* L. von Bortkiewicz: Die Iterationen, Julius Springer, Berlin, 19L?, 


** This is mentioned by E, Borel, Eléments de la Théorie des Probabilités, 
Paris 1910, p.106. I could not find the name of the performer of the 
experiment. 


L. H, C. Tippett: Random Sampling Numbers, Tracts for Computers, 
No.XV, Edited by Karl Pearson, Cambridge, 1927,. Price in England 
3/9; in New York, at G, EB, Stechert & Chey OL. Eaat LOth Streetyy 
$1.25 plus mlucnoapel 


+ cf. some remarks on page 18a, 
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given any other property Bis must be equal. to unity, so that P{B, | A} 
= P{B,} = P{B,|B,} = 1. On the other hand, if it is known only that 


P{B, ats 1 then it does not necessarily follow that PiB,, [By } must be 
equal to unity, 


3, Random Variables, We may now proceed to the definition of a 
random variable, We shall say that x is a random variable if it is a 
single valued measurable function (not a constant) defined within the 
fundamental probability set (A) with the exception perhaps of 4 set of 
elements of measure zero, ‘ie shall consider only cases where x is a real 
numerical function, If x is a random variable, then its value corres- 
ponding to any given element A of (A) may be considered as a property of 
A, emd whatever the real numbers @ < b, the definition of (A) will allow 
the calculation of the probs sor hate sa 7 P{a i b} of x having a value 
such that a=x<b, il 


We notice also that as x is not constant in (A), it is possible 
to find at least one pcir of elements, A, and Ap of (A), such that the 
corresponding values of x, say x, <. Xo are different. If we denote by 
B the property distinguishing both 4 Sond A. from all other elements of 
(4), and if a < bare two numbers such that“a<x,<b < X then 
Plaxx< b|B} = $.. It follows that if x is a random variable in the 
sense of the above definition, then there must exist such properties B 
and such numbers a < b that O <P{agxec b{B} = i, 


It is obvious thet the above two properties are equivalent to the 
definition of a random variable. In fact, if x has the properties (a) 
thet whatever a < b the definition of the fundamental vrobability set (A) 
allows the celeulation of the probebility P{a < x < b}, and (b) that 
there are such properties B end ‘such numbers ae<b that 0<P{a <x < b} ot 
then x is a random variable in the sense of the above definition, 


The probability P{a < x < b} considered as a function of a and b 
will be called the integral probability law of xs 


A random variable is contrasted with a constant, say 9, the numer- 
ical values of which corresponding to: all elements of the set (A) are all 
equal, If*@ is a constant, then whatever a < b and B, the probability 
P{a ae < b|B} may have only values unity or zero according to whether 
@ falls in between a and b or not, 


Keeping in mind the above definitions of the variables in discuss- 
ing them, we may speak in terms of random experiments, In the sense of the 
convention adopted above, we may say that x is a random variable when its 
values are determined by the results of a random experiment, 


It is important to keep a clear distinction between random vari- 
ables and unknown constants, The 1000th decimal, X 000° in the expansion 
_ of 7 = 3,14159.. is a quantity unknown to me, but it is not a random 

variable since its value is perfectly fixed, whatever fundamental prob- 
ability set we choose to consider, We could say alternatively that the 


value of &£ does not depend upon the result of any random experiment. 


1000 
Frequently we have to consider simultaneously several random 
variables 


> Xo, eee, xn (10) 


and their simultaneous integral probability lew, te be defined as follows, 


Denote by E the set of values of the x variables (10). This set 
could be represented by a point*, to be called the sample point £ in an 
n-dimensional space, say W, the rectangular coordinates of the point 
being the values X,, Xp, ..., X,- The space W will be called the sample 
space, Denote by w any region in W and eccept the convention that 


E€w 
stands for the words: "the point # is an element cf w". 


If the x. are random variables, then whatever be w, we mey speak 
of the probability of E being an element of w, and denote it by 


Pig € wh 


In fect this probability will be represented by the ratio. of the measure 
of that part, say F(w), of the F.P.S, in which the x, have values locat- 

ing the point E within the boundaries of w, to the mexsure of the F.P.S, 

itself, It must of course be assumed that F(w) is measurable, With 

thet restriction the probability P{E & wh is defined for every region w, 

This probability, considered as a funetion of the region w,is called the 

simultaneous integrel probability lew of the xy. 


epert from, or instead of, the integrel probability law we may 
frequently consider anothsr function called the elementary probability 
lew of the random veriables, This is defined as follows, 

at P{g c wh stands for the integral probability law of the vari- 


ables (10), and if there exists a function p(E) of the x, such that 
woatever be w 


Pee wh = ffl, p(B) ax) dx, eee Gm (11) 


then the function p(E) is called the elementary probability law of the 
random variables (10). 


It will be noticed that while the integral probability haw is a 


— — eee meee 


It is convenient to recall here that mathematicians define a point 
as a set of numbers, 
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function of the region w, the eleméntary. probability law is 4 function 
only of the point &, It will be noticed also that p(E) may be considered 
as being defined in the whole sample space and non-negative, Of course 
there are cases where no elementary probability law in the above sense 
exists, this however happens rarely in problems of statistics, 


It is important to know a few simple rules of dealing with ele- 
mentary probability laws, 


ee p(x, Xo 


ae x.) and p(X; Xp .-+ X,,) are the elementary 
probability laws of Pan § 


ae Setdala ate , art 
and 
| (12) 
x x coe x : 
7% > bi Weel. 
4 
respectively, then 
Or: 
P(x Ae hen X-1) - P(x) Xp, oced Anny xy) dx, (13) 


This rule permits the calculation of the clementary probability law of 
any single one of the x; whenever their simultaneous probability law is 
known, 


oi . ae there are two sets of n random variables each, 


X15 Xo; eoeey xn (14) 
and 
‘Vhs Vososee. Fy : | (15) 


such that each of the Ke is a function of the yy, possessing continuous 
partial derivatives wi bh regard to any Vu» the Jacobian 
; “ 


COs.) Saeem 
wii isdt Tet sing geese (16) 


Oy) Yoder ‘o, 


existing and being different from zero almost penrmere and wae. changing 
its sign, then the probability laws D(X, 6. x,) and Be Youem « ne of the 
variables (14) and (15) canopy are yen: by Alp identity 


ply 


= x ooo 17 
: Yo ar Yq) p(x, x, x) 14 (17) 


where in the right-hand side the Ks will ordinarily be expressed in terms 
of the Vie 


-". probability law: is known, 


are e 


Combining the two above rules we may calculate the probability 
law of various functions, f(E), of the x; whenever their simultaneous 


In order to clear the way for the material involved in the follow- 
ing lectures, I shall finish this one by giving the definitions relating 
to statistical hypotheses, 


Consider the set of random variables 


Xi» XO 5 @oey Xy (10) 


Any assumption concerning their probability law (either integral or ele- 
mentary) is called a statistical hypothesis, 


A statistical hypothesis is called simple if it specifies the 
integral probability law, P{E € w}' of the xX; as a Single valued function 
of the region w, 

Any statistical hypothesis that is not simple is called composite, 
It may be useful to illustrate these definitions by some examples, 


The assumption H, that 


‘ 


P(E) = (1/oV Bin) e X( xq - 12)2/202 


(18) 


where neither u% nor o>0 is specified, is a composite statistical 
hypothesis, In fact, if w denotes a region defined by the inequality 





2 
then 


a - - 1) 2/202 a 
P{e € W} = (1/0 Va)” fronts Rlxq — 1)?/ ax, 4xg oe 4xy (19) 
is not uniquely determined but is a function of the parameters uw and oO, 
which are left unspecified by the hypothesis Hj. 


On the other hand, the assumption Hy that the elementary proba= ° 
bility law of the x; is as given by the formula (18) but with = O and 
o = 1 is already a simple hypothesis. In fact, whatever the region w 
in the sample space, substituting = = O and o = 1 in (19) we shall be 
able to calculate the unique numerical value of P{E é wh, though ‘some- 

times this may be connected with great technical difficulties, 


- 18a - 
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With reference to the discussion on the law of great numbers, 
particularly in line with certain statements on page 14, it might be 
useful to remark that for any mathematical theory of probability, which 
will necessarily always be based on a given F.P.S., it is possible with 
sufficient care to arrange a set of experiments such that when a long 
series of them is taken into consideration, the results will approach 
the theory satisfactorily. For instance, it would be possible to arrange 
a real laboratory experiment in which chords of a circle are picked up. 
"at random" in such a way that if the performance is repeated many 
times, the relative frequencies in various classes of lengths of chords 
actually picked up will approach those of the first solution of Bertrand's 
problem (paze 11 and the upper figure of page 12). It would also be 
possible to arrange another experiment in which the frequencies in the 
same class intervals approach those of the second solution (page 11 and 
the lower figure of page 12). To assume, without actual experience and 
comparison (see also Lecture II, particularly page 22), that a series 
of real experiments conforms in any degree to the theory of probability 
based on a particular F.P.S. is presumptuous; and failure to recognize 
this point has more than once brought grave difficulties and incon- 
Sistencies, Editor, 


An article that should be mentioned in connection with Lectures 
I and II is one by D. J, Struik, “On the foundations of the theory of 
~ probabilities" Philosophy of Science 1, 50-70, 1934 (letter from 
Dr, Neyman dated 7th January 1938), 
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“LECTURE II: | ON PROBABILITY AND EXPERIMENTATION . 


ARES es te . MSA. ; OR Spi | area 
we Abstract character of mathematical .theorles and possibilities 


—— 


of applications. It is probable that many listeners at my first lecture 
were disappointed. They are engazed in various applications of -proba- 
bility to practical. problems, and such problems must be the only cause of 
their interest in the theory of probability. They may feel that they 
have no use for a theory in which "experinients |" “results,” ete.,-every- 
thing that is of utmost importance to them, are treated only as pictur 
esque descriptions of probability sets and measures, Those may be. good 
for mathematicians, they would sey, but we want a mathematical theory 
dealing with actual experiments, not with abstract probability sets. 


‘It may be useful to start this lecture by congidering more closely 
whether it is at all possible to satisfy that part of. the audience which 
is of the opinion described, One mizht , put the. question this way: Is 


it possible to produce a mathematical theory dealing with actual experi- 


ments or, more generally, with phenomena of actual.life? 


_ My. answer is, Probably never. That is, unless tne word 
mathematics changes its present meaning. The objects in a real.world, 
or rather our sensations connected with them, are always more or less. 
vague, and since the time of Kant it has been realized that. no general , 
statement concerning them is possible. The human mind grew tired of 
this vagueness and constructed a science from which anything that is 
vague is excluded--this is mathematics. But the. gain jn generality 
must, be paid for, and the price 43 the abstractness of conceptions with 
which mathematics deals and the hypothetical character of the results: 
Pt ak is B and Bis: C, then A-is also G. 

Of course, there are many mathematical theories that are success— 
fully applied to practical problems, - But this does not mean that these 
theories deal with real objects. If.they did, they could not involve 
general statements and could not be considered as mathematical. Let us 
illustrate this by a few examples. Modern geometry is 4 mathematical , 
science and is applied to practical problems. But does it deal with 
objects that we meet in actual life?, Let us see, Geometry deals with 
such conceptions as planes, straight lines, points, ete, -Is there any~ 
thing in real life that is exactly a plane in the sense of geometry? 

We say sometimes that the surface of this table is 4 plane. But if we~ 
look at this surface through a good megnifying glass we shall immediately 
say that it is certainly not plane. If we say that it is, we mean that 
for practical purposes it Gould be considered as a plane. 


are “Here we come to the essential point: when we apply mathematics to 
practical problems we never seek: (and if we would, we should never succeed) 
to find an identity between mathematical conceptions and some realities; 

we aré satisfied at finding some correspondence between them, by which 

a mathematical formula can be interpreted in terms of realities and give 

a result which, for practical purposes, would in our opinion be sufficiently 


ee 


accurate, ee ae i ce Sh es RR pe 


‘Consider a: triangle me formed by three points on this sheet of 
paper, Divide it by some lines into four smaller triangles Te, Ta, Tas 
and. To. If we state numerically the coordinates of all the vertices, 
we shall be able to apply known formulas and calculate the areas of. all 
the five triangles.’ Naturally the area of T; will be equal to the sum of 
the areas of the other four, This is geometry. But now take any im- | 
‘plements’ you desire and meastire the, sides of all the triangles as | 
actually drawn, Using those measurements and again applying formulas » 
we may be disappointed to find that the area of T, so calculated is not 
exactly equal to the sum of the areas of To,.T3, T4, and. Th. 


“Tt will be suggested that this is due to the errors of measurement, 
“That is true so far as the. expression "errors of measurements" stands | 
_Tor something broader, including tne fact that the dots representing the 
vertices of the triangles are not the points we consider in mathematics, 
However, for many pr practical purposes the agreement between the’ area of 

Ty and the sum of areas of Tp, T 35 Res and Ts will be. judged satisfactory 


and this is the decisive point in the question. whe sther the nathematical 
theory of geometry can. be applied in practice. 


A eines ex amination of other mathematical theories. applied to. 
practical problems will reveal the same features, The theory itself 
deals with abstract: conceptions not: existing in the real world. But 
there are: real objects that eprrespond to these ebstract conceptions in 
a certain sense, and. numerical values of mathematical formulas. more or. 
less agree with the results of actual: measurements, In earlier stages 
of any branch of methematically treated naturel science we are satisfied 
with only slight resemblance between mathematical and empirical results, 
but later on our requirements become more and more stringent. 


After this wands 1 cue general. introduction we may turn to the 
main topic of this lecture which is whether and how the mathematical 
theory of probability can be usefully epplied in natural science, 


a Random experiments < and the empirical. law of big numbers, 
It follows from what I said that the e foundations of the theory of “prob+ 
ability could be chosen in many ways, But however they are chosen, if 
their accuracy is on the level now customary in mathematics, the theory 
of probability will deal with abstract conceptions and not with real 
objects of any kind, Therefore the application. of such a theory will be 
possible only if there can be established a bridge or 2 correspondence 
between the conceptions of the theory and the real facts, The actual 
applications must be preceded by numerous checks and rechecks of the 

_ permanency and the accuracy of such correspondence, If this is judged 
sufficiently accurate and found sufficiently permanent, then the pre- 
dictions--the final aim of any science--based on the mathematicel theory | 
of probability, will heve sore views of success, Otherwise the theory 
may be interesting by itself, but useless from the'point of view of 
application, 
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What is then the cate gory of facts. that correspond to conceptions 
of ay a ag of probability es describe@ in a first lecture? What is 
the meaning of that correspondence? } 


The category of these facts may be described as the results of. 
random experiments. It is impossible to give an exact defirition of ex- 
periments that are called random, Equdlly it would be impossible to give 
the definition of such objects in the real world deserving the description 
"plane", "straight line", etc, Instead of: speaking of. real objects we, 
Shall speak. of abstract conceptions. At most we can give a reugh, de - 


scription illustrating it with some examoles so as to appeal.to the in- 


tuition. In whst follows, unless otherwise stated, whenever I shall 

speak of experiments I shall mean re&l experiments, not .hypothetical ones, 
There are experiments which, even if ce ried out reveatedly with 

utmost care to keep the conditions constant, yield.varying. results, 

They .a are "randon! , 


ee We may construct a special machine to toss coins. .This 

machine may be very strong, driven by an electrical motor so as to in- 
part a constant initial velocity to the coin. The experiments may be 
carried on in @ closed room with no noticeable air currents; the coin 
may be put into.the machine always in the same way; and then--I am 
practically certain that the results of the: repeated experiments will 
ary. Perhaps very frequently we mey get heads, but from time to time 
the coin will fall tails, The experimenter may be inclined to think 
that these cases arise from some "error -of experimentation", 


(b) Another example of this kind is provided by the roulette. 
A well constructed roulette with an electrically regulated start will 


yield varying results, 


(c) Those were types of random experiments arranged by men, 
There are some going spontaneously. Consider a quantity of radioactive 
matter and the a partiéles it emits in some specified direction within 
a cone of small solid angle, These particles could be recorded by the 


fluorescence they produce when falling on an appropriate screen, Let 
us observe this sereen for several consecutive minutes, one minute's ob- 


servations being considered as a single experiment, It will be found 
that however constant be the conditions of the consecutive experiments, 
their results will vary in that the number of disinteg srations recorded 
per, minute will not be the same, 


(da) Another example of this kind is provided by the varying 
properties of organisms forming an Fo. generation, howeve homogeneous 


- be the conditions of breeding. 


Those examples may make it sufficiently clea what I mean by 
random experiments, Now I shall explain in what Sense their results 
correspond to the conceptions involved in the theory of probability. 
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Let N be a fairly large number, say-.1000 or so, and n any other 
positive integer, Let us perform a long series of Nn random experiments 
of the type described, and count cases where a certain specified result 
E occurred, Let it be in M cases, Dividing M by Nn we shall obtain the 
ratio 


f= M/Nn. a (1) 


which will be called the relative frequency of the result E in the course 
of Nn trials, These Nn trials will be called experiments of the’first 
order, Now divide the whole series of Nn first order experiments into 

-N groups of n trials each in the order in which the trials were carried 
out. Each such group’of n first order trials will now be considered as 

a trial of second. order, 


The second order trials could be classified according to the 
number k of occurrences of the result E in the n first arder trials of 
which they are formed, Obviously k could be equal to 0, 1, 2, ..., N, 
in any one of the second order etiebieny Let my denote the number of 
trials in which E occurred exactly k Sa and 


Fin er my, /N (2) 
the relative frequency in the series of second order trials. 


It is a surprising and very important empirical fact that whenever 
sufficient care is takén to carry out the first order experiments in as 
uniform conditionsas possible, and the number N is large, then the rela- 
tive frequency Fn, k HDRERES to be very nearly equal to the familiar 
formula 


In other words, the relative frequency F,, k relating to a series 
_of.second order experiments is connected with thé relative frequency. f£ 
of the first order experiments very nearly in the same way as-the proba- 
bility Py | k discussed in my first lecture (page 7) and relating to the 


second order probability set, is connected with the probability p referring 


to the corresponding first order probability set. 


In order to avoid misunderstanding, let us describe the situation 


in greater detail. Suppose that the random experiment under consideration 


consists in 2N castings of the same die, and that f is the relative fre- 
quency of cases where the upper side of the die had six points on it. 

The value of f may be close to 1/6 or not, It may in fact considerably 
differ from uso depending on the structure of the die and the exact con- 
ditions of casting. But if we split the whole series of trials into con- 
secutive pairs, then the proportions of pairs with OQ, 1, and 2 sixes will 
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The above fact which has been found empirically* many times, peed 
be described in a more general way by saying that usually the single 
random experiments and various groups of these experiments beheve as if 
they tended to’ reproduce certain first order probability sets, corres- 
ponding to first order trials, and an appropriate second order probability 
‘set. This faet may be called the empirical law of big numbers, I want 
to emphasize that this law applies not. only tow the simple case discussed 
above ; connected with the binomial formula, but seems to be perfectly 
general, in the same serse in which we use the word general with respect 

_.to any other "general law” observed in the outside world, Whenever it 
fails, we explain it by. suspecting a "lack of randomness" in the first 
por. trials, “ 

\ ee ae that having repeatedly performed series of random 
experiments of some specified kind we have always found that they do 
conform to the émpirical law of big numbers, Then, as it is our custom 
to do, we expect them to behave similarly in the future, and the cal- 
culus of probability to permit us to make successful predictions of fre- 
oy nes OR of resuits of future series of experiments. 

This is the wey in which the abstract theory of probability’. 
described in my first lecture may be put into correspondence with 
happenings in the outside world:and how it may be, and actually is, 
applied to solve problems of practical .importance, The standing of the 
theory of probability is, in this respect, no different from any other 
branch of mathematics, The application of the theory involves the 
following steps, 


(i) Wishing to treat certain phenomena by means of the theory 
of probability we must find some element of those phenomena that could 
.be considered as random, following the law of big numbers, This in- 
volves a construction of a mathematical model of the phenomena involving 
one or more probability sets, 
(ii) The mathdmatical model may be satisfactory or not, This 
must be checked by observation, 


‘ (iii) If the mathematical model is found satisfactory, then it 
Imay be used for deductions concerning phenomena to be observed in the 
Supure . 


Cet us 12) ustrate these steps by a few examples taken from the 
current literature, 


ee 


Tis See for example L.von Bortkiewiez: Die Iterationen 
ot Julius Springer, Berlin, 1917. 


' of small squares, In order to explain 
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3. Illustrations. Example 1. Two bacteriologist friends of 
mine, Miss J, Supinska ‘and Dr. T, Matuszewski,, were: interested in learning 
whether the calculus of probability could be applied to certain problems 
‘concerning the.colonies of bacteria on a Petri-plate. The diagram 
reproduces a photograph of a Petri-plate with colonies that are visible 
‘as dark spots, You will notice also 
that the plate is divided into a number 
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the particular mathematical model that 
was tried in this instance, consider 

the contents v of one particular square 
‘and one particular Living bacterium B 
contained: in the liquid that was poured 
on to the plate... All the operations te 
performed with the liquid and the plate 
resulting in fixing the bacterium B in 
some point are considered in the mathe- 
matical model as a first. order experi- 
“ment: which may result either in B 
falling within v, or not. If there 

were N living bacteria.in the liquid 
poured on to the plate, then. there 

were N such first order expsriments all relating to the same square Vv. 
Those form a single second order experiment, Finally, if the number of 
Squares in which the plate is divided be n, then there wiil. be n second 
order experiments, which together could be considered as one third 
order experiment;. Without going into. further details of this mathe- 
matical model I shal state thet it implies that the probability of any 
of the squares containing exactly k.colonies must be approximetely equal 
to the Poisson formula . 





wi ees | 
Pet aise vai | (5) 


where A, means twa sveees number of colonies per square, ‘the soelex 
will notice that the above k satisfies the definition of a random vari- 
able the integral probability law of which is given by 


- 


b k 
P{a< k< b} = 2 e nw /kt for O< ax b (6) 
=a ' ‘ 


“If this. mathematical model could be assumed to correspond accurately to 
the actual experiments’ in the sense of the word as explained above, then 
it could be used for predicting frequencies of certain circumstances 
that are important in bacteriology. One of the questions that my col- 
leagues had in mind was how frequently a single colony is being produced 
by two or more unconnected bacteria, 


In order to solve the question whether the number k of colonies 
within a. square could be considered as a random variable and whether its 
probability law could be represented by the formula (5) my colleagues 
have performed a series of experiments summarized in the following table, 
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Zentralblatt fir Bakteriologie, Parasitenkunde und Infektionskrankheiten, 
Beir fl, Abteilung. » 1936, Bd. 95 


Comparison of distributions of colonies with 
Poisson Law , | 
T. Matuszewski, J, Supinska und J, Neyman 
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The values..of k are the numbers of colonies within the squares 
into which the whole plate was divided, ‘m' and.m:denote the observed 
and the expected numbers of sauares having the number k of cclonies, 
The kind of bacteria analyzed is stated at the top of each pair of 
columns, The last two lines’ give ‘measures of the goodness of fit, the 
chi-square and the corresponding: P, It is seen that without exception 
the agreement between the. observed and theoretical frequencies, obtained 
by multiplying the Py, of formula (5) by the total number of squares on 
the plate, is surprisingly good, As a matter of fact, the total number 
of similar experiments carried out up to the present is much larger, and _ 
in not a single case has any serious disagreement between the distribution 


of coloniss and the Poisson law been recorded, This entitles us to . 

expect that the results of future experiments willbe similar, and that 
the conclusions concerning those future experiments.drawn from the mathe: 
matical model described above, will be correct, or good enough, 


If the model imolies that in a particular case the probability of 
a Colony arriving from more than one independently floating individual 
is for instance P = .001, we may conclude that about 99.9 percent of the 
colonies were produced by one -individual only. - 


For the sake of clearness I may mention that-in the above state- 
ment "one individual" does not necessarily mean one cell. This expression 
refers to one or more cells that are floating together, being connected 
either mechanically or biologically. 


Example 2, The table following is reproduced from an article in 
Biometrika, and represents a comparison between the Poisson law, Eq. 5, 
and the distribution of dodder in samples of clover seed, The problem 
and the mathematical model were similar to that treated above, 


The table gives altogether 12 comparisons, of which ll sre based 
on material produced by Schindler and the last by the authors of the 
article, J. Przyborowski and H. Wilenski. It will be seen that the 
material as a whole is not as satisfactory as in the preceding example, 
It seems to follow that if the samples of clover seed are drawn by the 
method employed by Schindler, then the conclusions:concerning them drawn 
fram the mathematical model involving the Poisson law, Eq..5, will not 
necessarily be very accurate, But it is possible that the .method of 
drawing samples of seeds may be. so adjusted--this is the opinion of the 
two authors quoted--that the number of dodders growing per square could 
rightly be considered as a random variable following the Poisson law, 


Example 3, As mentioned above, if certain experiments show defi- 
nite divergence from a mathematical model that. is strongly suggested by 
intuition, then ‘the divergence may be ‘ascribed to "errors of experimenta- 
tion," and efforts may.be made to change the experimental technique with 
the hope that it ‘may result in-:a more satisfactory agreement between 
experimental data and the theory, 
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J. PRZYBOROWSKI AND H, WILENSKI 


Biometrika 27, 277, 1935 


Distribution of Dodder in Samples of Clover Seed, (Schindler's Experiments.) 
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x = 8,9169, P = 0,179 


ain i k = number of dodder seeds in a sample, n' = number of groups. 


N, = observed frequency, n = number of degrees of 


N.Py = expected frequency. freedom for the 
3 ‘ chi-square tables, 
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Another way, of course, having the same aim, is to alter the 
mathematical model, However, frequently the first method is more satis- 
factory, Examples of such attempts to bring particular experiments into 
an agreement with probability models are constantly carried out in big 
manufacturing works, arriving at what is called "statistically controlled 
production." In this respect the reader will find it interesting to con- 
sult the book by Dr. W. A. Shewhart, of the Bell Telephone Laboratories, 
The Economic Control of Quality, Van Nostrand, New York, 1931, 

I shall give here a description of the efforts of this kind which 
seem to be particularly interesting, 


Many laboratories are engaged in what is called routine analysis. 
Small quantities of certain materials are sent to the laboratory 
for determining the content of a certain ingredient X, The sample is 
subdivided into a few, three, four, sometimes five, portions, and those 
are separately analyzed. Denote the particular results by X1, Xp, Xz, 
and x4 respectively and by p the "true" content of the ingredient X so 
that the x; denote the measurements of LU. 


Owing to experimental errors the measurements x; differ from UL 
and differ among themselves. Frequently there is evidence that the 
measurements could be regarded as random variables following a normal 
law of frequency 


(x) = 234" (x - 4) */208 (7) 
o V2n 


so that this formula forms the mathematical model of the experiments of 
first order, The model may be used to estimate the value of u. It is 
obviously useless to try to obtain an accurate value of Uw knowing only 
the values of four measurements x,, X2,, Xs, and x,, But we can proceed 
differently. Denote by f, and f, some two functions of the x,;. If the 
x; are random variables, then f, and f, will also be random variables 
and we may consider probabilities of their satisfying any given in- 
equalities, We may also look for some particular forms of the functions 
of f, and f, such that the probability of their satisfying a given in- 
equality shall be equal to any given number between zero and unity. 
Starting from this point of view it has been found that the functions*® 


f, #2 te st/¥n | 
F (8) 


and | 
£, oF tae te st/vn | 


have a remarkable property. Here kK is the arithmetic mean of the measure= 
ments xj, n their number, s' their estimated standard deviation**, and tg 
the value of Fisher's t corresponding to the number of degrees of freedom 
J. Neyman, "Outline of a theory of estimation," Phil. Trans. Royal 

Soc, A236, 333-380, 1937. See also the conferences on estimation and 
confidence intervals, pp.127-142, and 143-160 respectively. 


** That is, st is an estimate-of o; s'® =X (x; - x)#/(n-1); see p.135. 
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on which s’is based, and to P=l-ae= e.g., Ul. If the measure- 

ments x; are Andewendent random variables following the normal law (7), 
. then whatever be the values of i. and o, the probability of ft, falling 

short of 1 and of fe exceeding p is Pe ee equal to a= ,99,. | 





This circumstance, discovered about 1930,* permits the estimation 
‘of p in a form of a random experiment. We perform the experimental 
-analysis, obtaining the valucs of the X;, and then state that 


| Het.) vnc aes 2 ee) vo (9) 








We may be wrong,in this statement, tut if the x, do follow the law (7), 
the probability-of our being correct is equal to a = ,99; in 99 percent 
| of such experiments, our. statement. concerning ! will be correct, 








The fobisuartiy choseii number ais called the confidence 
coefficient and rine interval between f, and fo the £Or ifidence interval, 
If the nuuber of measurements is small, something like n = 4, then the 
value of ty, is “te LAG and the accuracy of estimating LL as measured 
by the leneth of the confidence interval 
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- ff, = 2tq s/vn (10) 
is unsatisfactory, 


In what preceded, the- value of o in Eq.?7 was considered unknown, 
If, however, o is known, then the confidence interval will be written 


Se fons. th Biter o/ Vn (11) 
where Ty is the value of ta corresponding to an infinite number of * 
degrees of freedom in the estimate of o, What it means in practice may 
be judged from the following comparison. If a= ,99, then T, = 2,576, 
no matter what nis, At the sanc time then the values of ta are respec- 
tively 
tol = 65 657 ie nr =e 


to] = 9.985 ifn=3 
to; = 5.841 ifn = 4 


etc, 


It follows that whenevcr it is known not only that the analyses 
made in some particular labcratory provide numbers x that for practical 
purposes could be considered as particular values of a random variable 
following the normal probability law (7), but also that the standard 
deviation o has permanently this or that particular numerical value, 
then the same few parallel analyses could be used to provide equelly 
reliable but a Jaeh more accurate statement concerning the value of Lu, 
Therefore, if a laboratory is permanently engaged in performing analyses 
of some particular kind, it must obviously be interested (i) in keeping 
the value of o constant over long periods of time; and (ii) in estimating 
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— ,* See references on pages 157 and 158, 
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this value of o as accurately as possible, and (iii) in keeping watch 
over possible changes in 0. 

In order to keep o constant, say throughout a’ year, it is necessary’ 
to eliminate all factors that may influence the accuracy of the analyses, 
This is frequently done; but before trying to estimate the value of 0 . 
presumed to be constant, and before applying the formula (11) instead of 
(9) we must see whether the measurements that are being obtained do agree 
with the mathematical motel involving a constant o. Otherwise a repeat- 
ed application of formula (11) may give a much greater percentage of 
errors than that expected, ; 


This circumstance was realized by J. Przyborowski, who published 
the following table illustrating his efforts to stabilize the accuracy 
of his analyses of oats. In this table, s,* is the estimated variance 
of four parallel analyses, and s,* is’ the arithmetic mean of a number of 
such variances calculated for a long period of time, such as a year or 
more, If the value of o* was actually constant during such a period, 
then the value of s,* would be its very accurate estimate and the mathe- 
matical model adopted would imply a known distribution of the ratio 
Vv =-31°/85*. 

iam | 

The comparison of the expected and observed frequencies of the 
values of v are given in the tabT® for various periods. And here we see 
the curious results. of efforts to stabiliZe the accuracy of analyses, 
Year 1925 is very bad, 1927 and 1928 show Slight improvement, but are 
still bad. 1929 and 1930 are excellent; but this probably caused a false 
sense of security of the personnel, and the next year 1931 is again bad, 
However, the three year period 1929 - 1931 seems to be satisfactory. We 
may reasonably hope that the experience of 1931 has stimulated the ; 
staff of Professor Przyborowski's laboratory and that its confidence 
intervals based on formula (11), where the value of o is estimated from 
a great number cf previous experiments, do give correct statements con- 
cerning UW in nearly the: expected percentage of cases, 100a, 
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Distribution of estimated error variance in routine 





aN © apes: ; analyses of four parallel samples of oats, 





Przyborowski, Polish Agric, Forest, Journ. vol. 30, 1933. 


1925 1927 - 31928 














0-1 31,0 

l1-2 35.7 

| 2-3 28,2. 
| 3-4 20,3 
4-5 14.6 
5-6 \ 1.5 | 5-6 9.4 

Above 6 : | : Above 6 17 2t 

XxX? = 65,101 ' x2 = 22,928 x? = 15,217 


4 P = ,00000 P = 0,0Q0127 _.B = 0,0094 

| | 

-. 1929 1930 1931 1929 - 1931 
| 





ae 1 30.8 0 78.5 
jl-2 35.5 1 90,4 
2-3 28.0 2 71.4 
B-4 20.2 3 51.4 
4-5 13.9 4 30 64 
! 6 Pe 5 23.8 
Above 6 igs 6 Lf 
a 7-8 10,2 





Above 8 akSee 





: ; xX? = 4,332 xX? = 2,084 P x2 = 12,068 Xoo Oke 
P = 0,36 : P = 0,72 PrecO,0L7T eis Laat P -= 0,41 
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4, Summary. Now let us sum up the main points that I have tried 
to emphasize, When speaking about probability it is necessary to dis- 
tinguish* three different but related aspects of the problem: 


(1) a mathematical theory, e.g., the one described in my first 
lecture; 


(2) the frequency of actual occurrences; 


(3) the psychological expectation of the participant. 


The mathematical theory need not be the one I described, but if 
it is mathematically accurate, it will have nothing to do with the out- 
side world and therefore either with (2) or (3), for the good reason 
that an accurate mathematical theory implies accurate definitions and 
axioms, and that in the outside world there are no objects that satisfy 
them except within limits "good enough for practical purposes,” 


The theory of probability may be constructed to provide models 
corresponding in some sense to certain phenomena of the outside world. 
And here we may distinguish a divergence: (i) some authors try to 
provide mathematical models of what I called the random experiments here 
falling under (2). The theory presented in my first lecture is one of 
the types coming under this heading. The theory of Richard von Mises 
is another. (ii) when building a mathematical theory of probability é 
we may aim at a model of the changes in the state of the human mind con= ~ 
cerning certain statements that occur as a result of changing the amount © 
of known facts, This view is exemplified by the theory built up by 
Harold Jeffreys.** It will be noticed that the theory of probability 
of my first lecture has nothing to do with the "state of mind," though 
having found that the probability of a certain property is equal e.g. 
to 0.0001, the state of our mind will probably be influenced by this 
finding. 


As I have mentioned, any theory may be correct if the authors 
are sufficiently accurate in their deductions. However, it is my strong 
opinion that no mathematical theory refers exactly to happenings in the 
outside world and that any application requires a solid bridge over a 
precipice, The construction of such a bridge consists first in explain- 
ing in what sense the mathematical model provided by the theory is 
expected to "correspond" to certain actual happenings, and second, in 
empirically checking whether the correspondence is satisfactory. 


The examples I gave above, and many others that could be easily 
quoted, indicate that by taking care both in constructing a mathematical 
model and in carrying out the experiments, the bridge between the theory 
of probability I have sketched and certain fields of application may be 
very solid, 

* Compare with H, Levy and L, Roth: Elements of Probability, 
Oxrotd,s £256... Opts 

** See Jeffrey's Scientific Inference, Cambridge, 1931, and numerous 
papers in the Proceedings of the Royal Society (series A) and in the 
Proceedings of the Cambridge Philosophical Society, 
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LECIURE III: ON THE TESTING OF STAVISYICAL HYPOTHESES 


1, The traditional procedure in testing statistical hypotheses. 
The present lecture should not be considered as a direct continuation of 
the preceding ones, which were systematically connected, However I shall 
use the conceptions discussed in my first two lectures and perhaps some 
more. lft would be impossible to give all the necessary definitions here 


and I must assume them to be known. 








The traditional procedure in testing statistical hypotheses is 

. commonly known, but being trdditional, the opinions concerning its exact 

’ nature vary. I shall describe here one of the versions that seems to 
summarize the common phases in the history of several well known tests, 
such as the chi-test for goodness of fit, Student's z test, and others. 








laving to test some specified (in early stages, very vaguely 
specified) statistical hypotheses H concerning the random variables 


Ki» Xo5 eee, Xy 
we used to choose some. function T of those x; which, for certain reasons, 
seemed to be suitable as a test criterion. Pearson's chi-square and 
Student's z are instances of such criteria, The next step, and a diffi- 
cult one, consisted in deducing an accurate or at least an approximate 
probability law p(T|H), which the chosen criterion T would follow if the 
hypothesis H were true. The graphs of the probability laws considered 

-usually represented curves with a single maximum at a certain point of 
the range, decreasing off towards the ends. . This suggested a classifi- 
-eation of possible samples into two not very distinctly divided cate- 
gories, probable and imnrobable samples, If a sample E led to a value 
of the criterion T for which the value of p(T|H) is small compared with 
its maximum, then the sample E would be called improbable, or the 
hynothesis H improbable, and inversely. You will certainly remember 
instances where both very small and very large values of chi-square are 
Supposed to suzgzest that something is wrong, 


Having obtained an improbable sample in the above sense the usual 
way of reasoning was this, "Were the hypothesis H true, then the proba- 
bility of getting a value of T as or more improbable than that aetually 
observed would be (e.g.) P = 0.00001. It follows that if the hypothesis 
H be true, what we actually observe would be a miracle. We don't believe 
in miracles nowadays and therefore we do not believe in H being true." 


The above procedure, or something like it, has been applied since 
the invention of a first systematically applied test, the Pearson 
chi-square of 1900, and has worked, on the whole, satisfactorily, However, 
now we have become sophisticated and desire to have a theory of tests, 
Before all we want to know why should we use this or that particular 


a) ae 


function T of the Ky asa criterion? Why should we ¢est’ the goedness. 
of fit by calculating 


e 


sit 2 Fk Rg Le 
and not, say | He: 
x2 =P (m= me ft | (2) 
or | 
sen 2a pe AN, ho 


or anything. else? What is the actual meaning of a statistical Louer’ 
What is the principle of choosing between several tests that may be 
suggested for the same. hypothesis? It is the nurpose of the present 
lecture to discuss some of these questions and to explain some basic 
ideas underlying the contributions to the theory of testing statistical 
hypotheses for which Professor EH, S. Pearson and myself are responsible. 


The first. question I-shall discuss is this: when selecting. a 
criterion to test some particular hypothesis H, should we consider that 
hypothesis only,or something else? .It is known that some statisticians 

are of the opinion that good tests could be devised on consideration of 
the hypothesis tested only. My opinion is that this is impossible and 
that if satisfactory tests are actually devised without explicit con- 
sideration of something beyond the hypothesis tested, it is because the. 
respective authors suoconsciously take into consideration certain rele- 
vant circumstances, namely, the alternative hypotheses that. may de true 
if the hypothesis tested is wrong. It is. rather difficult to discuss 
what an author may have in his mind. subconsciously, or even consciously, 
But it is easier to consider situations .that may. present themselves 
when we are forced. to. select a test for a particular hypothesis H with 
nothing to base cur device on except this hypothesis itself, 


Suppose then that we have to test some hypothesis H, and that 
two. different criteria. T) and Tp. are suggested, Which.of them should 
we pse? What circumstances, referring to H and to nothing else, should. 
influence our. choice? .I could not think. of all the suggestions: that ras 
could be made, but I.do remember seeing opinions that the criterion with 
the smaller standard deviation would be preferable. 


Let us generalize this suggestion and consider closer the tenta- 
tive principle that the choice between possible criteria should be made 
on properties of their distributions as determined by H. ‘This: principle, 
eall it principle I, would obviously cover the aie of Ni relative. 
size of the standard deviations, a EPS 


With ale to ie inte ttle pis T. I shall Be’ that it ns 
sufficient for the choice, In fact I shall prove that there may be two 
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eriteria having the following properties: 


(i) both have identical freauency distributions; and therefore 
using principle f{ only it will te impossible to choose between them, 


(ii) whenever one of these criteria has the most "iniprobable" 
values, tius "disproving" the hypothesis tested, the values of the other 
are just the most "probable" ones, ‘This last circumstance will make it 
necessary to choose one of the criteria, 








Having in view the above situation I. shall mention another principle, 
vO be calied principle II, which has been suggested by certain eminent 
workers in theoretical statistics: whenever you have two (or more) 
criteria, choose the one which, on the sample obtained, is less favorable 
to the hypothesis you test, 


This principle implies, of course, that criteria could, and should, 
be chosen after the sample is drawn and analyzed. 


I shall show that, if this principle is adopted, then it is use- 
less to make any calculations having in view the testing of hypotheses: 
given a certain amount of mathematical skill we shall be able to dis- 
prove any hypothesis on any sample, 


The above two »vrinciples do not exhaust all the possibilities, 

There may be other priznciples that also do not go beyond the consider- 
ation of the hyvothesis tested. For example we may require some par- 
ticular properties of tne functions T to be used as criteria, e.g. that 
they should be symmetrical with respect to the random variables, etc, 
However I could not think of any such Limitation that would seem reason- 
able, There are particular cases known when a recognized criterion is 
not symmetrical with respect to ali the Ky» for instance when it is 
represented either by the smallest or by the greatest of all the obser- 
vations, Therefore, without claiming that the two propositions which I 
am going to prove below provide decisive evidence that it is absolutely 
impossible to bass the choice of criteria without explicitly or tacitly 
considering hypotheses alternative to the one that is being tested, I 
am inclined to think that this conclusion is highly probable, Anyhow 

. the two propositions do cover a certain range of possibilities and clear 
‘away certain popular misconceptions, They show for instance that the 

. argument like "use T, rather than To because its standard error is 

smaller" is not by itself persuasive. Let us now go into some details. 


' 2. Insufficiency of Principle I. Consider the system of n 


random variables 


es Vig Xo» sees xn 


ee ee 


_ known to be independent and following the normal law 
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x, (x4 ia uw) )2/202 (4) 


P(x, eee x,) 7 (1fo Von)” 
where o > 0 and Ut are apeyaee constants. “Suppose it is desired to test 
the hypothesis H that u = 0, This is known as Student's hypothesis, 

The generally accepted criterion to test H is that invented ty Student, 
namely, to calculate 


z= xfe . (5)- 
where ny 1 eh 
pi i a ns? =X (x; aha (6) 
The probability law of z, if the Hi petleentts H be true, is given by 
. 1 te: 
- 5n 
p(z) = o(1 + 22) ~ * (7) 
where 
(eo) K : 
cl . J Te Te u, 2) (8) 


The hypothesis H is to be rejected whenever the value |2"| of |z| cal- 
culated for the sample is so large that 


(ee) 
P{|zl > |zt]}=2{ plz) az (9) 
J ‘ ; 
: | 2’| 
is considered 'small", 
To prove the insufficiency of the principle I as explained above 


I shall now define another criterion, depending on the quantity £, which 
is to have the following properties: 


1. If H be true, then the probability law of a is identical with 
that of z, so 


-in | : 
p(f) =o(1 + £2) * a et)) 

2. The absolute value of the product (22, | cannot exceed unity, i.e. 
fae [<2 = 


If the 7 criterion were used to test H, then this hypothesis 
would be UR De whenever re is large. In fact the large values of 
[Z| are "improbable" whenever H is true. Owing to (11), whenever an 
is targe then [a] must be small and inversely, and it foblows that when- 
ever one of the alternative criteria 2 and { indicates that the hypothesis ~ 
H should be rejected, the other is bound to protest that there is no — . 
reason for such rejection. Therefore, whenever one of the criteria has 
a large absolute value we are compelled to choose the one the verdict of 
which we shall respect, Principle I will not help us in the choice, 
beceuse the probability laws of z and { are identical. This completes 
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the proof of the insufficiency of principle I. 


‘In order to define ? a let us: assume that the Kj are numbered in 
the order in which they are given by observattion, Let 


Zeta e alive | (12) 


and 


5 went ’ 3 a, n 
erty: = 2 = nut® = Bly + gi ee ae (13) 


The functions x' and 8' thus defined will be called the quasi mean and 
the quasi standard deviation of the Xs Now I shall prove what I shall 
eali Proposition a, the ratio 


S- 


has the see oer uy and 2 as ygy tan above. 


In order to prove 1, it will be sufficient to show that the simul- 
_ taneous probability law of %' and s! is identical with that of the ordi* 
nary mean-% and standard deviation s, 


If the hypothesis H be true, then u = O and 
SiS) ay" feo? 














OO Se ee OnE e ie (15) 
Let us introduce a new system of random variables 
Yi» Vos eoos Vn 
connected with the x; by the following formulas 
xy = yy Vin) + Yo ve | 
uy = “vy VAn)+ yp VE > (16) 
ix, =. Vy CO ere Oe yy) The : 
It will be noticed that 
Vy ek yo Elf Veen) = =" (17) 


and is therefore identical with our quasi mean defined in Fq.(12). We 
shall return to this notation after a while. Furthermore, 


Ta AICP es Ke ay? (18) 
and having regard to (13) we shall have 


Be eh tre ee Se5 Fy? (19) 
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The probability law of the: y;.will be deduced from Eq. (15) 
following the steps iaeeaiia in il ss i lecture, namely * : 


dere pae te y= pla, x a ta a lal (Ba. 17, page 17) (20) 
where |A] is the Jacobitin: defined by' Ba. (16) of page 17, and the 1 


in the right-hand side should be expressed in terms of the Yi Easy 
calculations give 


a ee A me i RR Steet ete 


DIY] Yo eee Yn) = P(X" yp Ss. ¥,) = C Va/ov(2n)Te 


where s*® stands for the sum of squares (19), Our next. step must con- 
sist in introducing still another lad of variables 


Uz» Up, eee, Uy 


one of which would be identical with x’ and another with s’. We shall 
put ‘ ri 


— a 
xr = Uy 


Yo =vVn Up GOS U 


n air bart 3 is COs Uy cos Us 


Ja. = te sin u 

¥g. we ig G05 U, COB UL) +. COB My Sim te 
Y4 = VM Up COS U, COS U,_} e-. Sin Uy >. (22 
° 
® 


° 


r. =VYn u 


sin u 
“2 Aad ae 


The range of Paniation of the. néw variables is determined by the 
following i arte | ; | 
-00 < uy < + © 
O< Up 


Y ' 23) 
ogu, < en Soi. 





pr upc gy as 4,5, Jeu! a 


wherefore outside thesé limits the Et WObE ht ey law of the uy is | identi- 
cally equal to zero, 


It will be easily seen that 


Pee iH ; Hy < ab 
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2 = (1/n) (yee + yg? add y,2) (24) 


and later on we shall drop the notation u and, Us substituting for them 
x' and st respectively. Easy calculations give for the Jacobian 





atx ’ Yo: ooo yg Yn) n-2 2 
CPS atannel MRE COS Ug COS” Us cos” We ome cost@5 y (25) 
aCuy;, u ‘u_) ; Bs 7 
re 23 eees Un 
and it follows that 
. n= ia —ean(uy® + uo®)/20% . 
p(uz Us. 466 U4) = ( Vn/ov(2n)]- Uy 1 2°)/ 
a : ‘Lae COS Uy cos” u, see costs Up (26) 


In. order to.obtain the simultaneous probability law of u, and Us 
or, what comes: to the same thing, of x' end st, we must integrate (26)° 
for Uz, Ugs soos UY ‘from -qo% ;0 +00, Owing #5 (the fact that the integrand 
differs from zero anes within the limits show in (23), and that these 
Limits for Ug, Ug, ee-, U, do not depend on the values of uj, and Ug, we 

shall have at once 




















n=-2 5 Hay? + Up?) /20? 


p(uyz Ug) = = Gt va/ewtan? Ug: (27) 


_ wherein 


| n-3 
- feel. cos Uy a: Us see COS Uy duz du, dus ... du (28) 


and the region of integration, w, is determined by 


O< us < 2n ) 
| (29) 
2 


meer Wg <i gemhaeiey ded By Yn] 


“ Remembering that uz, and Us are identical with x* and s! respective- 
ly, we have here Ae 


AY n-2 =n(xt2 + st2)/Z02 
hee, e*) Cy 8" os / : oe ABO 


We sce that the quasi mean and the quasi standard deviation as defined 
by (12) and (13) do follow a probability law identical with that of the 
ordinary mean x and standard deviation s of the x;. In order to obtain 
the probability law of the ratio G we must now perform on Eq. (30) ex- 

| actly the same operations that lead to the probability law of Student's 
Z; and it is obvious that the probability law of % will be found to be 
identical with that of z, This proves the first part of the proposition, 


Let us now prove part 2, namely, that Ee aie For this purpose 
notice that, whatever the real numbers a and b, we shall have 


» eer ‘a 
(a+ b)® = a® + -2ab +-b2 > 0 
end’ ‘theserate. | | ‘ 
2 fab) < a2 a6* ( 
It follows further that for any real numbers a and b, Prd 
(at b)® <e(ak + vee (8B 












If s is the.ordinary standard deviation of the xy and xX: ine 
mean, then* 


ns*® =. + hee - x)2 = (x7 - x)2 + (Xp - x)2 (34) 
On the other hand the definition of the quasi mean gives us 
2 = r= - 42 ‘= - - - 2 i eit 5 5) 

waa 2nx" (xy "= zop° Seite x) (x, OF perk ( ) 

and, owing 60% ns | : heeete Lig . 
: } g i uf ak 7 a .; 3 

anxt# <2 (xy - ze + (xp 4H). (86) 


Comparing fal and (36) we see tate: 


a ies (37) 


an inequality between the squares of the quasi mean.and the, ordinary 
standard deviation, From the definition of the quasi standard devi- 
ation (13) it follows that 
2 x4" -= n( st? slo e ie gle? + 2°) (38) 

Therefore 
Came et a oy ee Same © ae (39) 

and, qwing to (37) must ... 
cp 8% i (40) 


Multiplyine (37) an a (40) and dividing ‘the Rea mre FP 
by the product ae we get. 


(=*/s*) Fo) < HA Rant Sc ae aaa 


* The .siga Zi, “unless: accompanied by other indieations, will signify 
summation over! i. from 1. to ne t. 2, A! = 1, a tee ing ; 
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which is equivalent to | 27, | <1, or Bq.(11) of page 36. This fulfills 
the proof of the part 2 of proposition a, showing that the principle I 


PY. by. itself. is not sufficient for a choice between alternative criteria 


thet may be suggested for treating a given hypothesis,’ 


Ss Consequences < of supplementing Principle I by Principle II. 


., We shall now. SHOW : that Principle ‘I *could not be | usefully supplemented 
by Principle II, The combination of the two principles would read as 


follows: if .there are. several criteria ‘for ‘testing ‘a given hypothesis 
Ho; all following the same probability law as determined by H,, then 

the choice among them should be made after the sample is drawn and 
examined, and we should choose the.test that appeers to be the least 
favorable toli,. We have already seen that if the "Student's hypothe- 
sis" (page 36) be-.true,: then-Student's z is not the only’ function of the 
x; following the familiar probability law (7). We shall now show that, 
maAtever the sample E' observed in some particular case, not all the Xs 
being equal to zero, it is’ possible to find a criterion, say oe. which 
for this particular sample possesses the value + oo and which, in re- 
peated sampling, follows exactly.the same law as z and ¢ ‘aocksad above, 
If we adopt oth Principle I and Principle Il, then we shall have to 


. test Student's hypothesis using £°; and this” will lead to its rejection, 
-’ Thus in all cases, the only exception being when all the observed x; are 


equal to zero, Student's hypothesis will have to be rejected, which shows 
that the eonibie 1ation of the two principles I and II is not a reasonable 
solution of the difficulty. 


I shall now call the attention of the reader to the distinction 
between x', and x; used below, The symbol x; will mean, as before, the 
random variable following the law (15). On the other hand x'; will de- 
note a value of x; observed in some particular case, 


Proposition b, Whatever be the sample 


Be els (42) 


1 
G2 nD 
observed in some particular case, one at least of the x, being different 
from zero, it is possible to define a criterion Le represented by a 
function of the Xs and having the following properties 

(i) The probability law of £°,.as determined by H, is the same 
as that of Student's z and {, Eq.(7) page 36. 

(ii) The value £°(E’) of (°, calculated for the sample E', is 
equal to infinity. 


Lt will be noticed that ae with have to be adjusted to the 


sample E' already observed, Therefore the values (42) will have to 


enter into the expression of ge They are constant numbers and will 


'- play the réle.of coefficients. On the other hand, {° will depend also 


on the random variables Kae 
Proof. o: of f part LIP OF proposition t b, Since the order in which the 


Xz are numbered ii is of no Di aasi sas Sorte jpvr 4 we ney assume that Xie 9? eens 


~ 42 - 


are different from zero, m< n. Before defining %° we shall need the — 
numbers 7, Gp, sj a, which are to be connected with the x4 by the 
n eres. y 

a, xi/ v(x12 + © yy ve 4 = Dopr Bs om (43) 


Obviously ay # Ofer fs =<), 2p but any 0 for is he Ky kae ny 
also 


Further steps will consist in defining a pseudo mean x" and a.pseudo 
Shells evs then in making the identification als “ 


e = xn /s j 4 7 : P (45) 
Here the pseudo mean and pseudo S.D. are defined by 


x" = (a, Ey + vee © Ge x,)/ vn : ta (46) 


and 


st 2= (1/n) & x," - xt 2 dp (47) 


It will be noticed that if a, = 1/vn for i=1, 8, ..., n, then the 
pseudo mean and pseudo S. D, *pecome identical with, the ordinary ones 
(x and s), 


It will be sufficient to show the existence of a system of 
variables 


Vio Vos orey Vn 


the elementary probability jaw of which (Lecture I, page 16) as detere= 
mined by H is 
-n(vj® + st 2)/202 


Pige Ve cee ¥,,))* [ vn/o Vien) ] e (48) 


wherein 
Vy a" » end: nem Perit en Tee (49) © 


To show that v 


1» eee, V, exist with the probability law (48) we 
may introduce 


; 4 . 7 oie 25 : -: t aa 
By = % C(a,? * 22, + G1) (ay? +... + @2)) ~ for k = 2,3,,..,.mon 
eoee 4 (50) " 
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d relate Vis wees Vy to. xy] body by the following systems of equations: 
Br, ene ey Vp eh go y.7. %. (Bp Mee Cg FataPa Mg. * ene? By v,) | 
= ; et 2 ‘ < 
‘Xy Vn. @ Vy ~ (a,%/a5) Bo Vp + Os LB eu Vian? Ralid Gaanet Pa 
ie 5 2 z 7 | 
Z. = Vn, o, ¥, ~ (6)? + a,*)/a3] Ba Vn + Gs (B, Vz + ooe,¢ Bo Va) 
; $ (52) 
le a gig Se hee | ie | 
ie roe Ac Vy. C (a, A a ie Slee i q 1 )/a%] B,. ae . | 
Me os i Beier iin en wi! 
RF gs Dy sev, ML, Finally, if m =m, then | 
ee LL ee hee, 15) OR Bh vy, (52) 
*Otherwise,. 
Sy ee Sense Bei Yn (53) 
With sone algebraic reduction and the fact that a? ae SO cor =] 
(Eq. AA): it; will e found that 
v, # (1/ Yn) (a, X, + oe : a, x.) 
= x" ‘ : ; (54) 
and that 
elie eh oe ap sh = nv,* + (v5* ae wate + we 
So ey aime, Ome: | a ce ae a (55) 


a(x,, Xo yooey x). 





The Jacobian |A| = =Yn, as is not difficult 


(V5 5 -Vouewes vn) 


to work out from Eqs. (51),(52),(53). From Eq.' (55), and the value of 


Jacobian, it follows by applying Eq, (17} of page 17 (Lecture I) 


that if Eq. (15) is the simultaneous elementary probability law of 


%& Xg o.. X), ‘thon that of v) vp ¥.. Vy must be as written in Hg. (48), 


Now Eq. (48), herb the same form as "7 (21), and formulas (49) 


like (17) and (19), it is clear that the steps required to deduce 


°) from (48) would be identically those already hi to deduce 
} from (21), and the result must be the caimns hag (hi efor CS ; so the 


proot us completed, 


\ 
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Proof of part (ii) of proposition b. We must now prove the 
other statement (ii) on page 41 concerning namely, we must prove 
that if in the expression for ig we substitute, instead of the random 
variables x;, the particular obaerted valués x} of (42) .in terms of 
which the funetion £° has‘been defined, then the value {°(E') of '[° 
will be found equal “to infinity. Replacing’ x5 by xi in ” Eq. ( (46), and 
reniembering that the coefficients a; therein have dticady been defined 
by Eq.(43) in terns of xt, we ‘easily find-that the value of the pseudo 
mean calculated for the: sample E! is - 





itary = VE * 4 2? eae ee) (56) 


e 
since at least one of the numbers x! is different from zero, Further, 


substituting x! for x; in Eq, (47) to, calculate the pseudo S. -D. s"(BP), 
we find it tobe zero, It follows from Eq, (45) that tel E') = 
x"(E') /s"(E') = co,’ and this’ completes the proof of part (44) of propo- 


sition b, 


For the one particular sample E! already drawn, (° ‘has ‘the value 
co, but in repeated sampling it ‘follows -the same law as z and a 


It may be useful here to make the following remark, . No number 
of examples is able’ to provide a proof of.a general statement. - On the 
other hand, the failure of a single example is sufficient to disprove 
any genersl statement. Our purpose here. was to show that the “principles 
I and II could not be generally applied for making a choice from criteria 
for testing hypotheses, and the validity of the proof does not suffer 
from the fact that we Have Limited ourselves We the consideration of 
One particular example. 





As a matter of fact, it is easily seen how the above reasonings 
could be generalized, but suih, generalization would not prods oa new 
really relevant. Beene 


: 4, General basis of the theory of testing ‘statistical hypotheses, 
I shall finish this lecture by indicating what appears to be the general 
basis of the theory of testing statistical hypotheses. We must start by 


considering the situation in its most general form, 
4 a 5 thew <8 e 


(i) When we desire totest some particular statistical hypothesis 
Hp, we imply that it may be wrong, E,g. if we try to test Student's 
hypothesis that uw = 0, we admit the possibility' that it may be wrong 


-- and that therefare p may have some value other than zero, It will be 


seen that whenever 'wé attempt to test:.a hypothesis we do admit, sub- 
consciously perhaps, that there are hypotheses that are contradictory 
or,.as we call them, alternative, to the one that is tested. There is 
no reason why these alternative nypetheses should not be Soni Genes ex- 
Seine when ehoosing, an appropriate test. 


(44) Whenever we attempt ko test a hypothesis we naturally try 
‘to avoid errors in. judging it. This seems to indicate the right way of 
proceeding: when choosing a test we should try to minimize the fre- . 
quency of errors that may be committed in applying oat, 
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evine in mind the above two points (i) and (ii) we may proceed 
further and discuss the kinds of errors we may commit in testing any 
given hypothesis. Hj- it is easy to.see that there are two kinds: 


After having applied a test we may decide to reject the 
hypothesis Hoy when in fact, though we do not know it, 
it is actually true. This P called an error of the 
first kind. 
After having applied a test we may decide not to reject 
i. (this may be described for short by saying that we 
“accept Ho") when in fact H, is wrong, and therefore 
some alternative hypothesis H' is true. This is called 
an heal ata the se te kind, 

The test adopted should peainahe both kinds of errors, Now let 
us see what is essentially the eng) of any test, whatever be the 
principle on which it was .chosen, 


It is nothing but a ruls according to which we sometimes reject 
the hypothesis tested and sometimes accept it (in the sense of the word 
explained above), according to whether the observations available 
possess some Bednestioy specified ey the rule. The observations are 
some n numbers . 


Xy1 Xoy coos Xp 
the system of which could be represented by a point B-in the n- di- 
mensioned space W, having the x, for the n ¢oordinates*, The point E 
and the space W are called the sample point and the sample space, Any 
rule specifying cases where we should reject the hypothesis tested is 
equivalent to a specification of the positions of E within W which, 

if arrived at by observation, are to lead to a rejection of H. These 


positions usually nt up a certain Fegion, w, which is called the 


ne ee hs Bei 


“In conclusion we see that to ‘ohaioeeh a test for a statistical 
hypothesis H, we must choose a critical region w in the sample space W 


and to make : rule of rejecting a whenever E, as determined by obser- 
vation, falls within W. 


Let us illustrate this by one example, Consider the case where 
a sampled population igs divided into n categories and we test the 
hypothesis that the probability of an individual falling 


- within the ith category has some specified value p,; for 1 = 1,2,... n. 
Denote by M the total number of observations and me m. 


4 the number of 


te * It may be helpful here to refer back to Lecture I,.page 16. 
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them belonging ito the ith category. 


| The generelly accepted test of this hypothesis consists in re- 
jecting it whenever 


X2 =X (m, - Mp,)? /Mp; ; (57) 


is "too large", What "too large" means is a subjective question, but : 
there must be a more or less definite limit between values of chi-square 
.,that are "too large" and the others that are not. Let x” denote this — 
Limit; and consider. 4 space of n = 1 dimensions, the eoordinates of any 
point being My, Mo,. ee, My-j. As none of the m; can be negative and 
their sum could not exceed M, the sample space W will be composed of 
points E with all campatna ras mM), ,Mp , ee, My] Seing hon Degeham 
integers and satisfying the inequaLahs 


It is easily seen that the rule of rejecting Hy whenever ta > Kee 
is equivalent to considering the region w lying within W and outside 
the ellipsoid : 


X (my = Mps)*/Mpy = X_? (59) 


as the critical region, 

it is equally easy to see that any other test has a similar 
feature. For example, Student's test is equivalent to a:rule of re-. 
- jecting Student's hypothesis whenever the sample point falls within a 
circular hypercone with the axis 


Xz tg lem, Bey (60) 





Having Gisposed‘of this we may go on and discuss the probabilities 
of errors, First of all: is it legitimate to discuss the probabilities 
of errors in testing statistical hypotheses? Isn't it equivalent to 
discussing the probabilities of hypotheses themselves, which would be 
- useless? E.g., considering the Student's hypothesis, it would be use- 
“less to discuss its probability because this would be also the proba- 
bility of u = QO, As w is an unknown constant, the probability of its 
being equal to zero must be either P{ = o} = 0 or P{y = O} = 1 and, 
without obtaining precise information as to whether uw is a to Bere 
or not, it'is AGRO SRE to decide what is the value of f Plas : 


see To this criticism that . may be suapeated the answer is the follow- © 
ing. Undoubtedly, u is an unknown constant and, as far as we deal with — 
the theory of probability as described in my..first. two lectures, it is 
useless to consider P{u * O}. On the other hand our verdict concerning 
the hypothesis tested, H,, depends on the position of the sample point 
BE, that-is to say, on its coordinates, and those, according to our 
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hoon Tet nee. random varideres, Sit follows that our. verdads is ran+ 


. dom'and that theré is no in¢onsis eney in considering the probability 
pot its: wales aa or that. ‘property, for example of its being erroneous, 


Renabdas. chee nampa; point E and any region w in the sample space, 
The probability of E falling within w may depend on the hypothesis that 
happens to be true. For example, if formula (4) represents the proba- 
bility law of the x;, and 2 = 0, then the probability of E falling with- 
in some particular region w may be ~. On the other hand if u = 10, say, 
the same probability may be’ equal to 0.0001, Therefore we shall agree 
to denote by P{E € wiH} the probability of E falling within w calculated 
‘on the sedition that the Ne Ree H is true. 


Now. abnsider a nape tieels H which we desire to test, and any 
region w which we have chosen to serve us as a critical eae What 
are the circumstances in which we commit an error of the first kind? 
‘They: are: first, the hypothesis tested is true: and second, the sample 
point E falls within the critical region w, whereupon i ids is unjustly 
rejected, it follows that the probability of an error of the first 
kind must be calculated on the isederite i that H is trite and, ih fact, 
it is the Beeb AAs ye 


 P{E € wi}: 
of E falling within.w, 


Now let us turn to errors of the second kind, defined on page 45, 
For an error of the second kind to be committed it is necessary (and 
sufficient) that .the hypothesis tested: be wrong and that the sample 
point fail to fall within the critical region aoLeeted. Therefore, the 
probability of an error..of. the second kind is . 


a= Piz € wie} | (61) 





Obviously, instead. of .considering the above probability of 
‘committing an- error of the second kind we may consider that of ge eee 
ait, which is denoted by B(w|H), so that 


a(wlH}-= P(e € 1 ye 16a) 








B(w{H") considered as. a: function of H' is described as the power (the 
power of detecting the falsehood of the hypothesis tested) of the © 
region w with respect to the pibernae ay hypothesis H', 


Any Let erel eho ice of a test must.be made having regard to the 
properties of the power (62). In fact the values of the power 8(w|H) 
for a fixed region w and for a changing hypothesis H (which in particu- 
lar may be H,, the one we desire to test) describe no more and no less 
than the properties of the test based on the critical region w. In 
fact, what is it that could be called "the properties of a test?" To 
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know the properties of a test can mean nothing but to know (i) how Pre- 
“quently : ‘this test will reject the hypothesis H tested, when it is true; 
and (ii) how frequently it will disprove H when it is wrong. That is 
exactly what thé values of the: function B(wlH ) tell us, Without know-— 
ing the properties of Bi (w|i) we cannot very well say that we know the 
properties of the test based on w. And just these properties of the 
power seem to be the proper rational basis for. choosing: a <eGrs 


For example, considering the power of Student's eaaes it was 
possible to show that it has the following properties, which put it 
above any other test that might be suggested. 


1. The probability of rejecting the hypothesis H that pu = 
is always greater when this hypothesis is wrong than in cases when it 
is right. This property is described by the adjective "unbiased" 

- attached to the test possessing’ the property. . 


2. Any other unbiased test, if it leads to the same frequency 
of errors of the first kind, will Leen frequently detect the falsehood 
of the hypothesis tested when it is in fact wrong. 


Details of the theory of testing statistical hypotheses based on 
considerations of the probabilities of errors will be found in the serial 
"Statistical Research Memoirs" issued by the Department of Statistics, 
University College, London. (First volume issued in 1936). Various 
articles in that publication contain numerous referénces to similar works 
published elsewhere, ‘ 


Editors note: The reader may be puzzled by the sudden : appearance 
of 8(w|H) in Eq.(62); it may seem that P{E € wl H} could as well be used 
as a symbol for the power of a test, The point is that PLE € w|i} was 
introduced on page :16 to denote the probability of E falling within w, 
as determined by H, this probability being considered a function of Ww, 
H being fixed, It was called the integral probability law of the x, as 
determined by H. On the other hand, B(w| H) is considered as a function 
of ti, the region w being fixed. To, go into more detail, one writes 
P{E € w| H} if the emphasis is on the probability of E falling within w 
for a given set of parameters of the elementary probability law (page 16) ; 
in such a case one mizht be interested in seeing how the probability 
P{E a w{H} varies with w,. On the other hand, one writes B(wlH), if one 
is interested in seeing how various possible ‘values of the parameters 
of the elementary probability law affect _ the probability of E being an 
‘element of a fixed region w. ay 
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\ RANDOMIZED AND SYSTEMATIC ARRANGEMENTS OF FIELD EXPERIMENTS 


A conference with Dr, Neyman at the Cosmos Club in Washington, 
7th April 1937, 2 p.m., Mr. Frederick F. Stephan, Secretary 
and tditor of the Journal of the American Statistical 
Association, presiding. 


I am going to speak on a very controversial matter, whether 
systematically arranged agricultural trials could be treated with any 
success by means of mathematical statistics, Two eminent statisticians, 
who are also experts in agricultural experimentation, drastically dis- 
agree on this point and each of them has a number of supporters, One 
of the scientists mentioned is Professor R. A, Fisher, who claims that 
in arranging the field experiments systematically we are bound to obtain 
all sorts of biases in our estimates and ruin the statistical tests, 

The other scientist is "Student," who could rightly be considered as 

the father of statistical work in agricultural experimentation. He does 
not deny that formulas usually applied to estimate the experimental 
standard error in both randomized and systematic trials are inthe 
jatter case a little biased, tending to overestimate the error, But 

his claim is that the actual accuracy of a systematic experiment is 
usually greater than that of the randomized one, Too high an estimate 
of the standard error is in his opinion not especially important since 
it keeps the experimenter on the safe side. 


Those of the present audience who are familiar with the material 
of my first two lectures must be aware that the answer to the question 
must be both empirical and subjective, The application of formulas of 
mathematical statistics to the results of agricultural trials presumes 
the existence of some mathematical model of these experiments, and the 
question under consideration reduces itself to whether the correspond- 
ence between this model and what happens in actual practice is suffi- 
Ciently accurate. This question is exactly similar to the one whether 
the formulas of plane geometry could be applied to measure this or that 
area on the surface of the earth, Another similar problem, also men- 
tioned in my second lecture (page 24), is whether formulas deduced from 
the Poisson law of frequency can be successfully used to estimate the 
probability that a colony on a Petri plate is produced by a single in- 
dividual, 


The empirical character of the answer arises from the fact that 
it involves trials in conditions of actual practice. The subjective 
character is unavoidable, because having the results of the trials and 
also the corresponding theoretical’ deductions from their mathematical 

model, we have to judge whether the agreement is or is not satisfactory, 
One of the ways by which the insufficiency of plane geometry may be 
revealed consists in subdividing an area of the type it is desired to 
measure into several suitable partial ones and to measure all of them 


separately. If the measure of the whole appears to be very different 


from the sum of the measures of the parts, then we would say that the 
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assumption that the area measured is plane is too crude. But it will 
be up to us ‘to decide whether ‘the disagreement between the two nVesunaa 
is actually large or not, and in this. respect personal opinions may. 
vary. oY sit gts ; 


Having this in view I am going to give you a short account of 
the work recently done by Mr, C. Chandra Sekar* in the Department of 
Statistics providing the objective empirical part of the answer to the 
question discussed by Fisher and Student. The results that I shall 
describe are of the same character that are contained in my second 
lecture (pp.19-32): on -the one hand you will see figures representing 
frequencies of various results, as predicted from the mathematical 
models of the agricultural trials, and on the other the frequencies 
actually observed. If the agreement between the two is judged satis~ 
factory, the conclusion will be that there is no special harm in 
arranging the experiments systematically. If on the other hand you 
find that the agreement is bad, you will.require an alteration either 
of the mathematical model or of the experimental design--for example 
to have the trials randomized,’ But the question whether the agreement 
is satisfactory or not will be left to you. 


Now I.must enter into the details and describe the experiments 
that I have in mind, I shall deal with the experiments of a very 
common type: in which plots: are rather narrow and long rectangles all 
arranged in one row, They are combined in a few blocks and within 
each block all the compared agricultural objects (varieties or treat- 
ments) are distributed in one way or another. This is the general 
description. . Adding to it some details as to how the objects are 
distributed within the blocks we shall obtain the full description of 
the two types of arrangements under discussion, 


One of these is the so called arrangement in randomized blocks, 
You know thet in this arrangement each of the objects is repeated in 
each of the blocks the same number of times, e.g. once, and that the 
order in which the objects occur within each block is determined by 
random sampling. If the number of the compared objects is four and 
they are denoted by A, B, C, D, then in a randomized block experiment 


we may find the following distribution of the objects on the successive 
plots. ; : 


Block If Block: Iii 
Box tee D aes eee eae 


This is one type of arrangement and we know the formula by which 
to calculate the estimates of the true difference. between the mean 
yields which any of the objects compared, say A and B, are able to give 
if sown over the whole field, It is the difference between the means — 
Xx, - Xp of the observed yields. We know also hdéw to calculate an un- 
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* Just where Mr. Chandra Sekar's paper will be published has not yet 
been decided, 3 
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biased estimate s®* of the variance of our result. Owing to the fact 


‘that the observations referring to one block are mutually dependent 


(e.g. if the object A got the best of the four plots, then the object 
B must have got some of the worse ones) the further theory is not en- 
tirely clear, * ** | 


It is probable, however, that the application of the t test gives 
the results very much in accordance with its theory: The hypothesis 
tested, namely that there is no difference between the mean yields of 
the objects compared, is rejected both when it is true and when it is 
wrong with relative frequencies in good accordance with the mathematical 
tables. 


Many practical agriculturists find that the distribution over the 
field of the objects compared, if left to chance, is not always satis- 
factory. For example they would object to the variety B being sown twice 
on adjoining plots. In their opinion, the conditions in which the partic- 
ular objects are compared should be as equal as possible, and they 
think that this is best attained by some systematic distribution of the 
objects, such as 


ete, 





Frequently, though not always, a field experiment arranged in the 
above manner is treated statistically by means of the formulas mentioned 
above, which were meant for randomized block experiments, There is no 
doubt that from the point of view of theory such a procedure is wrong, 

The theory of randomized blocks assumes specifically that the blocks are 
randomized and its validity is easily shown to depend upon this assumption, 
But it is a question how large are the discrepancies between theory 

and practice arising from the disregard of this condition, 


The above systematic arrangement is very popular in Poland, I 
have spent much time and wasted much paper trying to persuade practical 
experimenters to randomize their blocks, but with disappointing success, 
Then the thought occurred to me that the agreement between theory and 
practice may be attained not only by altering the practice, but also 
by adjusting the theory. Consequently I produced a paper® giving a 


* J. Neyman with cooperation of K. Iwaszkiewicz and S, Kolodziejczyk, 
"Statistical problems in agricultural experimentation," Supplement to 
the J, Royal Stat. Soc., 2, 107-180, 1935, 

** BLL. Welch, "On the z test in randomized blocks and Latin squares," 
Biometrika 29, 21-52, 1957, 

2.J, Neyman, "The theoretical basis of different methods of testing 
cereals, Part II: The method of parabolic curves," Published by 

K. Buszcynski & Sons, Ltd., Warsaw, 1929, Price, about 50 cents. 
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statistical Miata of the secant trials arranged systema Cea. 


The general. lines are as follows. It is coninit that the natural 
level of. ‘fertility along the field may be adequately represented by a 
parabola of some not very high order, say the 4th. If u denotes the 
coordinate of the center of any of the plots, starting from the left, 
so that 

ie dy Beas : (43 


then the. true yield of A, if it were tested on the u-th plot would be 
A(u) = A+ bu + cu® + du% + eut (2) 


where A is a term depending on the object* A (treatment or variety), 

and b, .c, d, and e are unknown constant coefficients. A is here used 

to signify both the thing being tested (treatment or variety), and the 
true value (as the yield) of the thing being tested. Experience has 
shown, however, that confusion does not arise, and in fact the symbolism 
is a very convenient one. The true yield of the object B, if it were 
sown on the same plot would be given by 


B(u) = B + bu + cu® + dud + eut . (3) 


where B again depends on the object B but the other constants b, c, d, 
and e are the same as in Eq.(2). Similar relations are written for C, 
D, etc., b, ¢, d, and e being the same for all. 


In actual expériments we do not obtain what we call the "true" 
yields. What we obtain is the sum of the true yield plus an experimental 
error, due to various factors, such as inaccuracies in measuring plots, 
in treatment, damage by birds, etc. My assumption was that these experi- 
mental errors on particular plots are independent of each other. I then 
applied the Markoff** theorem to get the estimates of ‘the differences like ~ 


B-A, C - A, etc, 


and of their respective variances, 

* Dr. Neyman uses the word objects here to cover whatever is being com- 
pared in the experiment, The objects might be treatments, or varieties, 
etc, Special features exist in a comparison of varieties, other features 
exist for fertilizers. In the present discussion, the analysis applies 
as well to a comparison of treatments as to a comparison of varieties, 

and the word object is used to refer to either, A more felicitous choles 
might have been found, but inquiries have not yet brought forth a ig 
“better, Editor. as 
** See also F. N. David and J. Neyman, "An extension of the Markoff 
theorem on the least squares" in the Statistical Research Memoirs, 

vol.II (in preparation; publishéd by the Department of Statistics, 
University College, London W.C.1) 
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Granting the assumptions, the theory is correct. It certainly 
corresponds more exactly to the practice of the systematic experiments 
than the theory of randomized blocks does, but for a long time there was 
“no answer to the question what this correspondence means in figures. 

Now some numerical evidence is available indicating at least to my mind 
that the theory does correspond to what happens in practice at least in 
one particular type of systematic arrangement called half drill strip. 


This was invented by Dr. E. S. Beaven*who used it with great suc- 
cess breeding his renowned varieties of barley. The half drill strip 
experiments are designed to compare only two objects A and B, say two 
varieties. They are sown in long narrow plots, half the drill sowing 
A, the other half B.: The varieties are repeated in a systematic order 
rs follows, 


Sandwich I Sandwich II Sandwich III ay 
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You see that: four consecutive plots form what is called a sand- 
wich, two half drill strips with B, sown in opposite directions, are en- 
closed between two with A, also sown in opposite directions. These 
sandwiches obviously correspond to blocks, but you will see that those 
blocks are not randomized. ; 


We must distinzuish here between two kinds of randomizing the 
blocks of four plots to be occupied by two varieties only. One would be 
a totally unrestricted randomization, allowing arrangements like 


AABB, ABAB, ABBA’, BAAB, BABA, BBAA (5) 


The second kind of randomizing.would consist in randomizing the sandwich, 
' This would admit only two arrangements of the block, either ABBA or BAAB, 
and the choice between: them should be based on some random experiment 
such as tossing a coin. 


. If the sandwiches are randomized as just described, and if x; de- 
notes the difference between the sum of the- two yields of.A and the two 
yields of B observed on the i-th sandwich, then the ordinary theory of 
randomized blocks is applicable to the x;. But this is not so certain 
with respect to a systematic arrangement like (4). Of course, the 
arrangement (4) may be treated by the method of parabolic curves as de- 
scribed above. It is a matter of an easy adjustment of a few formulas 


ne eee eee 


* £.S. Beaven, Jour. Ministry of Agriculture 29, 436-444, August 1922, 
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and of preparing tables to facilitate the ‘calculations. But here again 
we come to the question whether the scheme underlying the method of para- 
bolic curves corresponds sufficiently close to what happens in practice, 


I shall now discuss the question of what empirical data are needed 
for deciding whether any particular mathematical model corresponds to the 
experiments. 


When comparing any two objects A and B, of which the former is 
some established standard, we may desire to obtain evidence that B is 
better than A. This reduces to the test of the statistical hypothesis 
Hy that the true average yield B of B if sown on the whole field, does 
not exceed that of A, say A. That is, Hp is the hypothesis 


B=A<0 (6) 


Whichever of the mathematical schemes described is applied, the 
test of: H, consists (i) in calculating the estimate of A =B-A, say 
x, (ii) in calculating the estimate s12/n of the variance of x, and 
(iii) in referring the ratio t = x/(s'f/n) to Fisher's table of t. If 
the observed value of t exceeds the value of tq corresponding to some 
small value of P, say 0.05 or 0.01, then the hypothesis Hp is rejected 
and we consider that we have "evidence" of B being able to give average 
yields greater than A.* 


The whole question under discussion, i.e., whether the field 
trials must be randomized, whether the non-randomized trials give any 
sort of bias in the statistical tests, is reduced to the following: 





(1) Whether, in cases when the hypothesis tested Hp is true, 
and, in particular, when A = B, the value of t = x/(s'/”%n) calculated 
by this or that method exceeds the fixed value of tq with the 


frequency +P prescribed by the theory. 


(2) Whether, in cases’ when the hypothesis Ho is wrong and thus 
B-A=A> 0, the t test detects this circumstance, the value of t 
falling above the critical t, with a frequency predicted by the theory, 
A = B - A having a prescribed magnitude, 


If on any empirical evidence either of the above two questions 
were to be answered in the negative, then we should say that the mathe- 
matical model that served as a basis for calculating +t = x/(s'/vm) does 
not correspond to the actual trials, and either the model or the experi= 
mental design should be altered, If, however, a considerable volume of 


* At this point the reader may wish to refer back to pages 28 and 29, 
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empirical data fails to deny either i or 2, theh the practical man would 
~ probably say that from a purely academic point of view (which may be in- 
teresting by itself) there may be disagreements between the experimental 
technique and its mathematical model, but that these disagreements do not 
‘eoncern him: in fact the statistical test gives all it is expected to 
give; it rejects the hypothesis tested when it is in fact true, and de- 
tects its falsehood when it is wrong, and with about the same frequencies 
as predicted by theory, so the experimenter knows where he is. 


It is seen, therefore, that the whole question is reduced to what 
is the actual empirical distribution of values of t in cases when A = B, 
and in cases when B- A =A>O. We must discuss the question how such 


empirical distributions could be obtained, 


_ iIt_is easier to obtain an empirical distribution of t for the case 
when A-= B, We have to use for this purpose the results of so-called 
uniformity trials, Imagine a large field divided into a number of very 
small-plots, considerably smaller than the ones used for actual experi- 
ments, To avoid misunderstanding we shall call them elementary plots. 

If you treat all those plots in exactly the same way, so far as possible, 
and sow them with the same variety, you will have a uniformity trial, 

The results of such trials, represented by a plan of the experimental 
field with the yields of single elementary plots, are to be found in 
various publications, Not all of them, however, are equally suitable for 
our purpose, mainly because the elementary plots used are not sufficiently 
small, or because they differ considerably from squares, If the elementary 
plots are very tiny squares, then they can be combined in various ways 

to form what could be real experimental plots. If we wish to see what 
would be the results of some particular experiment on this field, as in 
comparing some objects A, B, ses, which are in fact identical (though 
we are not aware of it), we simply assign these hypothetical objects to 
particular plots and then perform all the calculations on the figures 
provided by the uniformity trial and apply the tests that we should 

apply if we had to deal with an actual experiment. If the elementary 
plots are large or very long, then the same procedure can be applied; 

but it may be hard to produce experimental plots of the desired size and 
shape. : ee 





For our purpose we should need uniformity trials with elementary 
plots that could be combined into half drill strips. Suppose that many 
such hypothetical half drill strips are available in the form of a table 
like the following, where each parallelogram represents .a half drill. strip 
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and the figure written on it the-sum of the yields of. the elementary 
plots of the uniformity trial.of which the experimental plot is composed, 
Thosé would be the actual yields obtained on these plots: in an experiment © 
with two hypothetical but identical varieties A and B, Writing in suc- 
cessive letters A, B, B,-A,-etc. on the plan of the hypothetical experi- 
ment (as shovm), and-applying any given mathematical model, we can ¢@i- — 
culate. t, knowing that it refers.to the case where A = B, A-number of | 
such values of t,. independently calculated, will produce the distribution 
we want to compare with the theoretical ones deducéd by Student, namely 


oD eae ae co ee 

anil G(L +28) 08" : ae bos 
where t® = z2(n - 2) oy n- 1. is the number of FMB of cream on 
which the estimate s'* is based, 


If the sandwiches are randomized.» then. the estimate of B - A is 
simply the arithmetic mean.x of the numbers x; as ‘defined above, and. 


s'?/n ate (xp °° we - lsh . (8) 


The first authors: to run. tests.on rane Pesca eae trial data to see 
whether the distribution of x/(s'/vn) from non+randomized sandwiches 
follows Student's frequency of.t, so.far as.I am aware, were S,. Barbacki 
and R, A, Fisher.*.. They came: to the conelusion that the lack of random- 
ization is destructive to the t test, and they blame Student .for 
thinking differently, It seems to. me, however, that they were a little 
unfair to Student, and.that the figures si onodacen are not sufficient 
to- support their, Eta tamenbee 


They took just one uniformity trial in which weights of yieida 
of wheat on short parts of single rows were published.** They combined 
the adjoining rows to obtain the width of a:-half drill strip. The rows 
were sufficiently long and they ‘divided them into 12 columns and so ob- 
tained i oo Lumms of banethe tere half drill shrines each being a contin- 


lg number : (3) (2) (3) 
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Sandwich I.’ B 2 
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_ ete, 


* §. Barbacki and R, A, Fisher, Annals of Eugenics 7, 189-193, 1936, 
** G. A, Wiebe, Variation and norraldiies in grain yislée among 1500 — 
wheat nursery plots. J, Agric, Res, 50, 331-357, 1935. 
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uation of the strips in other columns, Then they said that these columns 
.. will, represent the results of. six hypothetical experiments comparing some 


variety A with another B, Experiment: No. 1 would consist of sandwiches in 
column 1. and column 7; experiment No. 2 would consist of sandwiches in 
columns 2 and &, etc.,.as marked in the figure... Then they calculated t 
for each such experiment and were pleased to find that, in spite of the 
fact. that the hypothetical varieties A and B were identical, the distri- 
bution of the empirical t.was far from similar to the theoretical one, 

In fact all of the. t had the same sign! -This of course should be expect- 
ed, since the t.thus calculated were not independent, It is known that 
the direction of rows is frequently that of ploughing and that in this 
direction we frequently observe what I call waves of fertility: if one 

of the plots. in the first row (see plan on page 56) is better than the 
corresponding :.plot in the second, then this is likely to be true for all 
other plots in these rows, These waves of fertility are very marked on 
the field used by Barbacki and Fisher and consequently the value of t 
calculated for any one of these hypothetical experiments could not be much 
different from the one for any of the others. The whole argument is as 

if we would toss a penny just once, look at it six times and, having re- 
corded six heads, argue that the penny must be biased. The authors are 
unfair to Student because he called attention to the fact that parts of 
the same strip are highly correlated among themselves.* 





It follows that we could not accept the results of Barbacki and 
Fisher as conclusive in the question we are interested in. Their fig- 
ures emphasize only the known fact..that there is danger in replicating 
an arrangement on plots in adjoining columns, as an error in one of them 
is likely to be repeated in the others, This does represent an advantage 
of the randomized. arrangements. but does not show that systematic experi- 
ments, if carried out with due precautions, give necessarily biased 
results, ; 


There is no doubt, however, that the application of the formula 
(8) does represent a crude treatment. This was recognized by Student 
who, in his paper published in the Supplement to the Journal of the 
Royal Statistical Society, vol.III, pp.114-136, 1936, suggested a new 
way of proceeding. This is based on the hypothesis that the level of 
fertility along the row of drill strips is either Pvising or falling 
off more or less regularly, so that, within each pair of the half drill 
strips, the fertility of the next half drill strip differs from that of 
the preceding one by a fixed quantity, which Student called the Linear 
fertility slope. There is no doubt again that this assumption does not 
correspond exactly to: what happens in practice, but the formulas that 
the new mathematical model involves--let it be called the new Student's. 
method--have a greater chance of giving satisfactory results than 
formula (8), In fact, this method along with that of the parabolic 
curves, is based exclusively on the assumption that the experiment is 
arranged systematically. Whether it works satisfactorily must be test- 
ed empirically. , 
* ‘Student, "On testing varieties of cereals," Biometrika 15, 271-293, . 
1923; see pp.286-287 in particular. 
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Some work designed tio throw light on the question we are interest- 
ed in has been done by one of my students; Mr. C. Chandra Sekar, He 
tried to collect as many uniformity trial data a's he could possibly find, 
‘and on each field he arranged a number of independent hypothetical ex- 
periments in systematic half drill strips. The total number of these 
experiments was 120. For each of them he calculated t, once using the 
formula implied by the new Student's method and next, applying the 
little more complicated procedure of the method of parabolic curves 
(pp.51-52). The two diagrams — on the next page ec the results ,* t 


These are the objective results referring to question 1 above 
concerning the case where the compered objects are identical, They may 
be’ supplemented by the probabilities of getting something worse merely 
by chance, In the new Student's method this probability is 0.173 and 
with the method of parabolic curves it is 0.643, These figures seem to 
indicate a certain advantage of the method of parabolic curves, pe 


Having those objective empirical results, it is now a personal ~ 
question whether: to consider them favorable or imravorebis to Student's 
opinion that so far as the validity of the t test is concerned there is 
no special harm in the lack of randomization of the sandwiches, Jf you 
want my personal opinion on this point, it is this: were the lack of 
randomization of the sandwiches really disastrous for the t test in the 
case of the two compared objects being identical, then the available 
empirical material would have demonstrated this circumstance. Some di- 
vergence probably exists, and if more empirical data. were available it 
would doubtless become arparent. However, the divergencé could not be 
of very great importance, I always prefer to deal with mathematical 
models corresponding as seinen as possible to practical experiments, 
Therefore, if the practical agriculturist insists on his sandwiches 
being systematically replicated, I would advise him to work them out 
using the method of parabolic curves rather than. the-new Student's 
method based on a hypothesis concerning the level of fertility, which 
seems a little artificial, The empirical results (the graphs on the 
next page) seem to show that this is safer. | 


We may now turn to the next question and see the results of . 
trials with systematically arranged sandwiches when: the object B does 
give greater average yields than the object A.. The theory of the 
t test referring to this case is not so generally known as the results 
arising from no dif ‘ference. between. A and B, and I will.remind you off 
certain ‘important points. a 


* In eee to the pretise ts * appearing in ned diagrams on ehes next 


page; it may. be said that this. .is a criterion for testing goodness of: 
fit, which, for large samples, proves to be unbiased and most powerful 
with respect to the alternatives that are "smooth." Its theory was 
-briefly described .by J, Neyman in..the Comptes. rendus 203, 1047-1049, 
1936 with a further note in:the same volume on pp. 1211-1213, The 
"smooth" test for goodness of fit will appear in the Siunad gir ime 
Aktuariedskrift pp.149-199, 1937, (in English), 
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(i) It has been shown* that the superiority of B over A will be 
discovered by the t test more frequently than by any other imaginable test, 


_ (ii) The frequency of the t test failing to detect a difference 
B - A when it actually exists and is equal to p times the true standard 
error o of x is known and depends on the number of degrees of freedom on 
which the estimate of o is based. This is what is technically called 
the probability of an error of the second kind; see Lecture III, p.45. 
The first short table of this kind was published by S. Kolodziejczyk**. 
This was later supplemented in a joint paper by myself, K. Iwaszkiewicz, 
and S. Kolodziejezyk®, wherein certain graphs are published, of which 
two are shown on page 61. Finally, a differently arranged table was pub- 
lished by Miss B, Tokarska and myselft, 


In these graphs n means the number of degrees of freedom on which 
the estimate of error variance is based. Further, a means the fixed 
level of significance you work on. To make this diagram clear let us con- 
Sider an example. Suppose you are arranging a randomized blocks experi- 
ment with six treatments and three replications. In this case n = 10. 
From previous experience you know that the standard error per plot is 
likely to be say 10 per cent of the average yield, and you want to know 
the probability that the experiment will fail to detect as large a differ- 
ence between your treatments as 20% of the general mean. The expected 
value of your o is 10 W273 = 8.16, Your A = 20, and thus p = 20/8.16 = 2,45, 
From the diagram you find that the probability of the t test failing to 
detect the difference between the treatments when it is as large as 20 
per cent of the average yield is about 0,25 if a@ = 0.05, and about 0.55 
if @= 0.01. You will probably decide that the experiment planned is 
not sufficiently accurate, and you will try to increase the number of 
replications, 


Of course those two points (i) and (ii) refer to the ideal case 
of a complete correspondence between the experiments and the mathematical 
model involving the normal distribution and mutual independence of 
"errors". Our problem is to see whether the existing divergences from 
this model influence the validity of the theoretical conclusions, 


With regard to point (i) raised above, there are unsurmountable 


J. Neyman and E, 5S, Pearson, Phil, Trans. Roy. Soc. of London A231, 
289-3357, 1933. 


** S. Kolodziejezyk, Comptes rendus, Paris, t.197, 814-816, 1933. 
# Neyman, Iwaszkiewicz, and Kolodziejezyk, Supplement to the J. Roy. 
Stat. Soc., Vol.II, 107-180, 1935; pp.133 and 1% in particular, 


+ J. Neyman and Miss B, Tokarska, J. Amer. Stat, Assoc. 31, 318-326, 
1936, 
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These diagrams are reproduced from pages 1335 and 134 of an article 
"Statistical problems in agricultural experimentation" by J. Neyman, 
K. Iwaszkiewicz, and S. Kolodziejezyk, Suppl. Journ. Royal. Stat. 
Soc. vol.II, 107-180, 1935. 
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difficulties in this respect. There is no way to produce empirical evi- 
dence that in any fixed conditions of experimentation it is impossible 
to invent any test which would be more sensitive than the t test. If 
any other test were suggested, then we could produce empirical results 
compering its sensitiveness to that of t, and this comparison might show. 
that the alternative test is better than t. But any number of such com- 
parisons, all of them favorable to t, could not prove that the t test 

is actually the best. For this reason, and because no test alternative 
to t has been suggested, we shall drop the question of empirically test- 
ing question (i), 


The empirical test of point (ii) is much easier, though it re- 
quires a lot of calculations, In fact the problem is very similar to 
that dealt with in the case where A was identical with B. We start by 
producing what could be the results of actual trials in half driil 
strips, including the actual inequalities in soil fertility and the 
actual experimental errors, in which however the true average yield of 
B is by so much greater than that of A, For each such experiment we 
should calculate the value of t and ses how frequently it fails to ex- 
ceed the critical tabled value of t, that is to say, how frequently the 
t test fails to detect the advantage of B over A. This frequency must 
then be compared with the probability of an error of the second kind to 
be found from the tables mentioned on page 60 or from the graphs on 
page 61. 


In order to produce the quasi empirical data for the above pur- 
pose we use again the same uniformity trials that were used before. I. 
have mentioned on page 55 that on each of the fields with uniformity 
trials it is possible to arrange more than one hypothetical experiment 
in half drill strips. Each of them gives an estimate of the error vari- 
ance, Several such estimates were averaged, and this average was taken 
as the true value of the error variance for the experiments on any 
particular field, 





To see clearer what was done next, consider the situation on : 
any two particular fields, The assumed true standard deviations of the 
estimates of B - A on those fields are respectively o, and O,. Using 
the graphs of probabilities on page 61, the values p(20), p(40), p(60), 
and o(80) of p were found, for which the probabilities of errors of the 
second kind are 0.20, 0.40, 0.60 and 0,80, Those values were then 
multiplied by o, and o, to obtain what I shall denote by A,(20), A,(20), 
A,(40), ete., so that for example 


Ai(20) = 6, 6(20), © A,(20) =ogs(eoky ete, 


You will notice that A,(20) represents the value such that if the differ- 
ence between B and A tested on the first field were equal te A,(20), 

then the theoretical probability of the t test failing to detect the 
advantage of B over A would be exactly equal to 0.20, 


Suppose that the values of A;(20), A4(40), A,(60), and A,(80) are 
calculated for the ith field. Take one of the hypothetical experiments in 


J 
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the systematic half drill strips previously arranged on some particular 


field from data of uniformity trials, and add A;(20) to all the hypo- 
thetical yields of the object B, Before this sale en the variability 
of yields from plot to plot was due solely to soil variation and techni- 
cal errors, since all the plots were equally treated and sown with the 
same variety, After the addition of A;(20) to the yield of the hypo- 
thetical B, we obtain what could be the result of an actual trial.of A 
and B, i ated the effect of soil variation and technical errors, 

A - B having the property that whatever the true yield of. A, the ae 
yield of B is greater by the amount A;(20). That is what we want for 
testing the distribution of t when B - A =-A;(20). 


Mr. C, Chandra Sekar calculated t for each of the experiments in 
such systematic sandwiches, obtained in the above way from the data of 
uniformity trials, Again both the new Student's method and the method 
of parabolic curves were tried, The results, in the form of frequencies 
of non-detection of the advantage of B over A, both observed and 
theoretical, are set up in the following table, 


Relative frequencies of failure to detect a real advantage 
of B over A in systematic half drill strip experiments, 


es “Method of Student's 
parabolic curves method 
percent percent . percent 
20 Zoe a SS, 
40 40.8 Pe 46.7 
60 S250 4 ee 
80 Toco 75.8 


Again, this is the objective part of the answer to the question 
whether the lack of randomization ruins the t test. The first column 
gives you the theoretical frequency of cases in which the t test should 
fail to detect the advantage of B over A, The other columns show what 
these frequencies would be in a number of experiments in which the 
veriability of the soil and the experimental errors are exactly as they 
were in actual uniformity trials, Is the disagreement sufficient to say 
that the t test is of no use-when applied to the systematic half drill 
euraoe; | This, es I said, is a personel question, So far as I am con- 
cerned, the agreement ae the theory and the empirical results 


‘seems to be satisfactory, Especially in the case of parabolic curves 


the t test both detects the advantage of B when it exists and suggests 
its existence when it does not exist with relative frequencies very 
much the same as indicated by the theory, 


In consequence I do not see any evidence that lack of random- 
ization by itself is ruinous to statistical tests. We must however 
remember the following points, 


(i) The above empirical results refer to one papienier svetomne 


_ atic arrangement in half drill strips: ABBA, etc, .It is reasonable 
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that if we take any other systematic arrengement, the. conclusions 
suzgested by the empirical results would be different. If we take 
the systematic arrangement of the blocks with mcre than two objects 
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then probably the advantage of the method of parabolic curves over the 
ordinary formulas for randomized blocks will be more marked than in 
the case of half drill strips, but. this requires an empirical test. 


(ii) The weves of fertility are an important feature that should 
be borrmein mind in any case and especially when the trials are arranged 
systematically. Whenever I was able to ascertain the direction of 
ploughing, I found that the fertility seems to stay steadier along the 
direction of ploughing than across. It seems to me that the direction 
of ploughing may be the real cause of these waves, but I have no defi- 
nite evidence of this, Sometimes the waves are difficult to detect 
when you simply look at the uniformity trial data. In other instances 
they are very pronounced, The following little table gives a part of 
the uniformity trial data with rye as described by Hansen*, Looking at 
them you wiil hardly believe that all the plots were sown with the same 
variety and equally treated, but. this is a fact, 


Hansen 
Yields of rye. Uniformity trial data, 1909, 





Probable 
Direction of ploughing 





your plots along the columns, The results would be deplorable. On the 
other hand, if long and narrow plots were cut across the columns, the 
experiment might have been fairly successful. 


Imagine now you use this field for an actual experiment, and cut . 


If practical circumstances forced one to cut the plots along the 
columns of the above, say four rows deep, so that out of each column 
we had two plots, then it would be most inadvisable to arrange a system- 


. ' 
eo ae Hansen, "Prévedyrkning paa Fors¢gsstationen ved Aarslev," — . 
Tidsskrift for Planteavl 21, 553-617, 1914, e 
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atic experiment replicated exactly in the two rows, e.g. 


ABCD, ABCD, ... 
ABCD, ABCD, ... 


Since the second row would repeat almost identically the same soil errors 
as there are in the first, In such circumstances arendomized arrangement 
would be most useful. In this sense, the randomized arrangements do 

have definite advantages over the systematic ones, 


Turning to the question of the waves of fertility I think that 
from the point of view of accuracy of agricultural trials it would be 
most useful to have some indication of their cause, Probably it would 
not be too difficult to make a special experiment to discover whether 
their direction is actually connected with that of ploughing. This, 
however, would require a considerable area, 


In any case it seems advisable to carry out all cultivation pro- 
cesses comnon to all the plots, as ploughing, etc., in the direction 
across their greater length, 
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ON CERTAIN PROBLEMS OF PLANT BREEDING 


A conference with Dr, Neyman in room 4090 of the Department of 
Agriculture, 2th April 1937, 10: a aM Dry .8.. Cc, Salmon presiding. 


The problem that I am going to Ataauuel refers ‘partiéularly to 
the breeding of..new varieties of sugar beets, However, it is probable 
that in ‘the process of breeding other plants. similar problems arise, 
and therefore the present discussion may have a wider interést. 


The idea of the problem originated from contact with sugar beet 
breeders in Poland, The results that I am going to present, however, 
are due to Mrs. Y. Tang, M.Sc.,°all the details of which will soon be 
published in-her paper pr repared at the Department of Statistics, 
University College, London (cf. footnote p..74). 


The process of breeding new varieties of sugar beets is fairly 
complicated, but a rough idea of its essence could be obtained from the 
diagram on the next page representing schematically five distinct steps. 
In considering these steps we must remember several important points 
concerning: the sugar beet. The first is that the sugar beet is a two 
year plant. During the first vegetative: season. a seedling produces 4 
plant with a big root containing a considerable amount of sugar but 
yielding no seeds, Those are produced in the course of a second vege- 
tative season when the plant uses the food accumulated previously in its 
roots in the form of sugar, The second important point consists in the 
fact that the sugar beet is a cross fertilizing plant, and this makes 


it extremely difficult, if not impossible, to produce anything like a 


furs line. Finally we must remetiber that the production of new varieties 
may have various aims: we may try ‘to produce beets with highest ‘sugar 
content, giving the highest yield of roots per acre, or producing the 
highest yield of sugar per acre, The discussion which follows applies 
to the three cases, but we shall have in mind only the first of them, 


Having all this in mind let us consider the-diegram and see what 


‘are éssentially the five consecutive steps leading to new varieties, 


The first step consists in choosing out of the existing varieties a 
number of roots which, for various reasons, seem to be promising, and in 
forcing them to cross fertilize, For this purpose you plant those roots 
in pairs on plots, isolated from one another in a larger field of some 
cereal, : 


‘It is hoped that the capacity of producing high sugar content in 


old varieties may be cumulated as a result of crosses between them, 


But it is clear that a cross must sometimes cumulate the capacity of 
producing a low sugar content. Therefore not all of the progeny of the 
crosses are suitable for nhaeda sibibialgbosss and we have to perform a 
selection, ; 


All the seeds produced by the crosses are sown on a larger plot 
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and produce roots, forming the material for what.is called the indi- 
vidual selection, the second step in our scheme. At the end of the 
vegetative period all the roots are lifted, washed, and weighed, Out 

of each root a small portion is cut out and analyzed for sugar content, 
This cutting does not kill the root, which is able to. produce seeds as 
if it had been left intact. The majority of. roots so analyzed are dis- 
carded as unsatisfactory, The remaining ones, with.the highest sugar 
content, or having Certain: morphological characteristics indicating that 
they may be able to produce’ high sugar content, are stored for the winter, 
and then in the spring are planted separately on isolated plots to pro- 
_duce seeds, This is the third step in our scheme, Each of the selected 
Toots is called a parent plant, and originates a new variety. 


Obviously each parent plant is able to produce only’a very limit- 
ed amount of seed, Therefore, two or more vegetative seasons must be 
used to multiply the seeds of the new varieties, and this is described 
in the diagram das step IV. | . 


The fifth and last step consists. in checking whether and which 
of the newly bred varieties do possess any marked advantages in sugar 
content over some established standard. .We must remember that the sugar 
content of any individual root depends not only on the genetical compo- 
sition of the plant, but frequently to a greater extent, also on various 
conditions of environment, 


Consequently the sweetest of the parent plants selected in step 
II do not necessarily produce the varieties with the highest sugar con- 
tent, Also it is possible that still sweeter varieties might have been 
produced by some of the roots grown in step II that, owing to un- 
controllable variation of environment, had a small sugar content and 
unfortunately were discarded, The field trials (step V) are meant to 
eliminate the individual vsriability of sugar content in roots of a new 
variety. We may put it also otherwise: analyses in V are a comparison 
of varieties, wherein the properties of individual roots are more or 
less ignored, 


Needless to say, alongside the field trials in step V we con- 
tinue to multiply the seeds of the new varieties, and the final decision 
as to whether any one of them is a success or not is not made in one 
‘year only, but after several years' trials, But those are details. 


Anyhow, after the fifth step is concluded the breeder has to 
decide which of the new varieties are suitable for being put on the 
market, Other families are discarded as failures, 


I must call your attention to certain consequences of the fact 
that the sugar beet is a cross fertilizing plant and consequently that 
any single individual is heterogeneous with respect to a number of 
pairs of genes, The consequence is that what we call a "new Variety" 
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does not ee ak anything stable, but changes from generation to gener- 
ation, 


4 Further, according to the law discovered by Galton and vies is 
a consequence of the Mendelian lews, the change is unfavorable to the 
breeder: there is necessarily a regression (i.e. a set-back) in sugar 
content. This makes it impossible for the breeder to find just one or 
two exccedingly sweet varieties and keep them for reproduction, without 
selection, from year to year, After a relatively short period the sugar 
content of new zenerations will drop low and he will lose the market, 
Consequently each breeder has to repeat constantly the steps described 
above - perhaps with certain modifications, and to start each year with 
step I, while continuing the following steps applied to varieties 
started in previous years, 


Another consequence of the instability of the varieties is the 
instability of the standard veriety, with which the new varieties are . 
compered in step V. As each variety changes necessarily from year’ to 
year, so must the standard change, even if it bears the same label. 


In Poland it is usual to take as standard that variety which in 
the preceding year proved to be the sweetcst. The beet sugar industry 
arranges each year competitive experiments with a number of varieties, 
produced by severel leading firms, Those experiments are carried out 
in a number of places all over the beet growing districts of Poland, 
and all according to a certain fixed method, with the same number of 
replications, etc. 


The seeds used are purchased on the market by a special committee 
and sent out to stations bearing convent Long) numbers but not the names 
of the producers. 


The results are then officially published, and each of the pro- 
ducers can’ see how the results of his own efforts compare with those of 
the others, The consumers again are able to judge whether they were 
lucky in selecting the particular growers from which they bought the 
seeds, They try also to make some forecasts for the future, In this, 
however, they are frequently misled, because what is called a variety A 
produced by a firm X and put on the market in 1937 is essentially differ- 
ent. from what will bear the same labels in 1938. Frequently it will 
also compare in different ways with another variety. Still, there is a 
certain amount of relative stability and the publication is useful. 


DR. DEMING: Could you give some figures and the references which 
they and others like them are published? 


DR. NEYMAN: The results of Polish competitive experiments are 
published yearly in Gazeta Cukrownicza, the official organ of the beet 
sugar industry. The following table gives the average results of the 
experiments of 1936. 
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DELEGACJA NASIENNA 


Polskiego Przemystu Cukrowniczego 
- Wyniki doswiadczen, wykonanych w roku 1936 nad ee ae burak6w cukrowych 


z roznych odmian nasion 







Przecietne wyniki z 13 pél doswiadczalnych. 


‘Doswiadczenia wykonano w 120 powtdrzeniach. Wyniki doSwiadczen uwzgledniono z mnoZgnikiem maksymalnym 5, 
zmniejszanym odpowiednio za punkty karne). 
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The headings translated into English run like this. 


Seed Testing Commission of the Polish Beet Sugar Industry 


Results of the competitive trials with sugar beets of various origins ., 
carried out in 1936 


Average results of 13 trials with the total number of 120 replicates 


Breeder | Yield of Yield of | Average Sugar Yield of 

and roots in leaves weight of content, sugar in 
variety quintals (same root in percent quintals ; 
per hectare units) kilograms | per hectare © 





No, 8 translates into "variety N., Last year's seeds, chain control," 


odm, = variety 


1 quintal = 100 kilograms 
1 are = 100 square metres = 0.02471... acres 
1 hectare = 100 ares 


The experimenters are not informed of the varieties of seeds con-— 
tained in the bags.* Nos. 2, 7, and 11 contained the same variety, 
serving as a control of the accuracy of the experiment, 


Similar publications exist also in other countries, but I can not 
give exact references, If I remember aright, in England the competitive 


* Dr. Neyman wrote to the editor as follows: "You may be surprised 
that the experimental results are printed, whereas the names of the 
varieties and the breeders are typed, This is because the latter in— 
formation is kept in secrecy till the meeting where the accuracy of the 
experiments, and the agreement between the results of the experiments 
carried out in various places, are discussed. If the experiments dis- 
agree, then the commission may decide that the trials failed to provide 
reliable data, and in such cases they do not open the sealed envelopes 
containing both the order numbers and the names of the varieties; and 
the results of the trials are not made public,” Editor, ; , 


GS in 


experiments are carried out by the. National Institute of Agricultural 


Botany in Cambridge, German results are, I think, published in the 
’ "Zackerrubenbau," 


As I have mentioned, the standard used by Polish breeders is 
always the variety that in the last year’s competitive trials proved 
to be the sweetest. It changes from year to year, 





It should be noted that this does not apply to the process of 
breeding of other plants. For example in Ireland, where most of the 
barley grown is bought by the Guinness Brewery, there is a perfectly 
established variety (barley is self fertilizing), namely the one that 
is preferred by Guinness. The aims of the breeders consist in surpass- 
ing this variety in certain of its properties, and it is used as standard 
in most (if not all) of the field trials, Later on I shall mention 
_. also some other important differences between breeding barley in Ireland 
le and sugar beets in Poland, which have a direct relation to the problems 
me ~that I shall discuss, 








After this somewhat lengthy preliminary, we may turn to those 
problems, They are statistical in character and refer to steps II and 
Von page 68. Their aim is to see how the breeder is likely to increase 
his chances of success. We -must now review some of the Bee ee causes 
of his being unsuccessful, 








1, He may be unlucky in choosing plants for his crosses in Tas 
* But this is nwt a statistical problem, 


&. Supposing he was successful in I, he may be unlucky-in II 
by failing to select for further breeding the roots that have the best 
genetical properties. This is a problem that is partly botanical and 
partly statistical. The statistician may advise the breeder to select 


for further breeding es many parent plants as he possibly can, so as 


not to omit the best ones, I shall call this advice Ay 


3. suppose now that the breeder was successful both in steps I 
and in II 0,68, and, consequently that some of his’ new varieties that 
actually come for comparison with the standard in V, are better than 
the standard, Obviously he may again be unlucky and lose those new 

“varieties. The accuracy of the field trials is known to be limited and 
it is just possible that through unavoidable errors the experiments fail 
to detect the goodness of the best varieties and they will eventually 
be discarded, This, of course, would be most unfortunate, since it 
would mean a total waste of considerable amount of efforts, money, and 
time--a number of years! This again is a problem for the statistician 
and he will give what I shail call advice B: make your experiments as 
accurate as possible; if you cannot improve imprcve the method of Sei 


then increase the number of replications, 


Both advices A and B are, of course, sound, but they will seem 
very troublesome to the practical breeder, His means are always more 


ee 


or less limited and so is the area on which to carry out the field trials, © 
Now each of the advices A and B, if followed, leads to an increase in 
the area of the field trials, And the breeder will ask: I am able to 
try out so many experimental plots on which to test a few new varieties, 
and in this case the trials will be with many replications and therefore 
fairly accurate; or with many new varieties, but then there will be only 
few replications and the test only superficial; which is better? Or 
rather: what is the.right proportion between the number of new varieties © 
to be started each year, and the number of replications in field trials ; 
comparing these varieties with the standard? we 


This is just the problem that was dealt with by Mrs, Y. Tang,* one 7 
of the students at University College in London, Her results show how 
to calculate approximately what would be the results of plant breeding 
for any given ratio of the number of new varieties and the number of 
replications used, Of course, the final results of such calculations 
must depend on many local conditions, 


It is interesting to note that the solutions of the above problem, ~ 
advanced by practical breeders, most probably on intuitive grounds, 
differ enormously. The number of new families of sugar beets started 
yearly by the Polish breeders goes into hundreds, while the number of 
replications they use is sometimes as small as four and I have not heard 
of its exceeding sixteen, On the other hand, the breeders of barley in 
England and Ireland start with only four or perhaps five new families ; 
and then test them in perhaps 40 half drill strips! It is entirely possi- 
ble that this difference is due to special characteristics of the two 
particular plants and also to the cost of land, labor, etc. But it is 
possible also that the general intuition of the practical worker was 
in one or in the other case misled, q 


I must now remind you of the nature of the errors that may be 
committed when testing statistical hypotheses, I will do so, treating 
the particular case of the comparison between a new variety V and the 
standard, S. Denote by V and S the true average sugar contents that 
those two varieties are able to yield if sown on the whole experimental 
field and if there were no technical errors, We are interested in the 
difference 


Raye ie (1) 


which may be termed the true sugar excess of the variety V over the 
standard, or simply the suger excess, for short. If A be positive, then 
the new variety V will be considered satisfactory, Otherwise it is a 
failure, The experiment does not give us the true value of A but only 
its estimate, x, which is always affected by a positive or negative 


* Mrs, Tang's paper is being printed in the forthcoming number of 
Biometrika, 29, Parts iii and iv, 1937. 
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experimental error 6, so that 
= At Gs ee) 


Before putting the variety V on the market the breeder wants to. 
have some "evidence"that it is satisfactory, i.e. that A (not x) is 
positive, He must be particular on this point, because otherwise his 
goods will frequently be inferior, and he will lose his customers, 
Mathematical statistics is helpful in this instance and provides means 
by which the frequency of cases when A is judged to be positive without 
being positive in actual fact can be reduced to any low level, a, called 
the ae oh of significance and chosen in advance, 


Statistically the problem of the breeder is Pome nie to the test 


of the hypothesis H, that 


Vos #a<.0 | (3) 
If as a result of this test we. decide to reject the hypothesis Hp, this 
is equivalent to a recognition. that we have "evidence" of A being positive, 


i.e., of the new variety being better than the standard, 


The test of the hypothesis Hj will consist in the rule of reject- 


ing Hg whenever 


xf sytem r (4) 


where s is the estimate of the standard error of x, and tg is a constant 


number taken:from Fisher's tables in accordance with the number of 


degrees of freedom on which the estimate s' is based, and corresponding 
to his P = 2a. This test was originated by Student. 


The properties of this test are: (i} whenever the new variety 

is barely as good as the standard, i.e. when A = 0, the hypothesis tested 
will be rejected (and thus an unsatisfactory variety put on the market) 
only with the relative frequency equal to a. (ii) whenever Ho is true 
and the new variety is not so good as the standard, i.e. when A< 0, 


this frequency will be even smaller than a, (i132) whenever the hypothesis 
tested is wrong and the new variety is superior to the standard, i.e, 


when A> 0, then the above test will detect this circumstance more fre- 
quently than any other imaginable test.* 


We must be clear on this point, and therefore let us consider some 
numerical illustrations. One breeder, A, may desire that the proportion 
of his unsuccessfully bred varieties that will reach the market should 
not exceed 5 es bites he puts his level of significance at . sah rte 


a alla al ed lea a 


* J. Neyman and E. ‘S. Pearson, "On the problem of the most efficient 
tests of statistical hypotheses." Phil. Trans, Roy. Soc. London A231, 
229-507 , 1933, 
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the number of degrees of freedom is in his case 12, then t = 1.782. 

Thus he will reject the hypothesis H, and say that his variety is good 
‘enough to be put on the market when ee 1.782 se Some other breeder B 
may consider 5 percent of all unsatisfactory varieties he gets too great 
a limit; he may consider that the proportion of such varieties slipping 
in on the market should not exceed 1 percent. In such a case he would 

| puta = 0, a1 and select t corresponding to P = 2a = 0,02, On this 
basis 6 Woue let his new variety through only if x> 2.681 S. Some 
other ae, may be still more cautious, 


DR. SARLE: Is there any: WORneoe of being too ees Pe. 


, DR. NEYMAN: Oh yes, there is, and I am most grateful for this 
question, The danger consists in the fact that whenever we are too 
particular in trying to avoid unjust rejections of the hypothesis tested, 
(that is, when it is in fact true), then we are exposing ourselves to an 


increased risk of failing to detect cases when Y is actually better than 5. 


At: this stage it will be convenient to use the special terminology 
introduced to distinguish between the two kinds of errors that we may 
make when testing a statistical hypothesis (Lecture III, p.45) and in 
particular, when judging whether a given variety is or is not better 
than the standard. If as a result of a test we reject a hypothesis when 
it is in fact true, we say that the error committed is of the first kind, 
Thus when the breeder puts on the market a variety that does not exceed 
the standard, then he commits an error of the first kind. The error of 
the second kind consists in accepting the hypothesis tested when it is 
in fact false, Thus, when the breeder does not find sufficient reason 
for judging his variety satisfactory, i.e. when s < tg, when his new 
variety is actually sweeter than the standard, so that, though he does 
not know it, A> 0, then he ¢omnits an error of the second kind, 


The errors of the first kind are dangerous to the trade of the 
breeder, but so are the errors of the second kind, It must be remembered 
that each of them means a complete waste of efforts and money spent for 
a good number of years: after all those years a variety exceeding the 
standard in sugar content is successfully produced and then an error of 
the second kind causes it to be discarded, Therefore it is necessary to 
‘have as clear an idea as possible regarding the chance of committing an 
error of the second kind... For the numerical evaluation of such errors 
we use the charts on page 61, which were introduced for the study of the 
randomized and systematic experiments, 


In the notation being used here, the "standardized" error of the 
second kind is 


ps * nilouiea Vee Ohne ee | Rene (5) 


This is the true value of A divided by the true value oF oo being the 
true S.D. of x (not the estimate s as used in Eq. 4). 
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To illustrate:the use of the diagrams in answering the question 

raised by Dr. Sarle, we suppose that the arrangement contemplated for a 

future experiment is in randomized blocks with three varieties and six 

replications, making n°= 10 degrees of freedom. Suppose further that 

previous experience indicates that o may be taken as something like 

0.5. Let us now see what in these circumstances would be the chance 

of detecting that:a particular variety is better than the standard when 

A is actually positive and as large as 1 percent, To answer, this question 

we calculate p = A/o = 24 and refer to the curves corresponding to n = 10 

on page 61, It is seen that if we use the level of significance @ = 0,05 

(the upper chart, page 61) then the probability of an error of the second 

kind is about 0.42, On the other hand, if a = 0.01 (referring to the 

lower chart, page 61) the probability of this is 0,65, This means that 

if the true value of the mean excess is as large as 1 percent, and if 

we use alternatively a@ = 0,05 and a = 0.01, then in the circumstances 

of the experiments, the mere existence of the advantage of the new variety 

over the standard will be detected only in about 58 or 35 cases 

respectively out of a hundred, You see here how the excess of caution 
with respect to errors of the first kind (0,01 in place of 0.05) leads 

to an increased chance (65 out of 100 in place of 42 out of 100) of 

committing errors of the second kind, 


It is obvious that the graphs on page 61 describing the dependence 
of the probability of errors of the second kind on the values of p and 
n are relevant from the point of view of problems in plant breeding 
considered here. In fact any seed breeding establishment, after a few 
years of its existence, must be aware of the size of the standard error 
per plot, say 09, which is likely to hold in future experiments, It is 
impossible to predict its exact value, but it is certainly possible to 
make rough estimates of its upper limit, Therefore the breeder con- 
tenplating experiments with m replications is able to substitute some 
reasonable ‘number for .o into the expression for p.= A/o, taking 


One® Goiv(2/m) x | (6) 


He may then use the tables or graphs of the probabilities of 
errors of the second kind to find out approximately what will be his 
chance of detecting the advantage of his varieties when A = V-S has 
any value he may be interested in, If he finds that a certain value 
of m this chance is too small, then he will think of increasing the 
‘number m of replications. This will decrease the value of 0, increase 
that of p, and consequently decrease the probability of an error of the 
second kind, i.e. of failing to detect a good variety. This procedure 
must be considered as essential in any rational planning of experiments, 


But in the case of the plant breeder a special difficulty arises, 

Suppose he finds that with 5 replications and a = 0,05, the probability 
of‘ detecting a good variety for which V exceeds the standard S by 5 
percent is»a fairly large one, say 0,9. eM ie will be seen that this 


AES 


result is not very helpful. In fact, it is difficult to’ say beforehand 
how frequently his steps I - IV (page 68) will yield him new, varieties 
exceeding the standard in sugar content by so much as five percent.- It 
is possible that such ‘success in breeding is unthinkable, and that 
usually A does not exceed, say one-third of a percent. 


Looking at the above graphs, it is easy to find that in such a: 

' @ase the chance of the breeder detecting the goodness of any of his. 

verieties will be very small. Thus if he keeps arranging his. experiments 
with only m= 5 replications, then practically all om his. efforts-in 

breeding new varieties will be wasted, 


. it is seen. that the sort Ton of the breeder's problem requires 
‘not only knowledge of the probabilities of errors of the second kind 
but also of the: distribution that he is likely to obtain in the future 
for the values of A in the population of his new varieties. It is im- 
possible to predict what will happen in the future, ‘but it is possible 
to make rough guesses by studying what’ happened in similar circumstances 
in the past. We may try to estimate what was the distribution of A in 
past years and use those estimates to obtain an idea of sons may iinet: 
in the future. 


The oe ee stated as follows, In some particular year, 
M experiments comparing a large number N of new varieties with the same 
standard gave the estimates of sugar excesseS'X,, Xz, ws»; Xy, and M 
estimates of corresponding standard errors S,, Sp, .--, Sye:: lt is re= 
quired to use those numbers to estimate the distribution, say p(A), of 
the true excesses A,, ..., Ay of those new varieties. 


A similar problem was previously considered by Eddington and the 
solution is quoted by Levy and Roth,* but Mrs. Tang offers a new ap- 
proach.** Her method consists in the following. 


Denote by V_ and U_~ the kth moments about zero of x and A re- 
‘spectively, and by o® the variance of the experimental error € in the 
observations x. If it be assumed that € is normally distributed, which 
is a traditional assumption, then, as Mrs. Tang has calculated, 


Us = Va ] ; 
bp = V_-0? | )- 
ie oe . , 

" 4 
Uy = Vy 60% +30 J ey 


Ce 


* HH. Levy and L. Roth, Elements of Probability (Oxford University 
Press, 1936). ° 


* The editor Foie 1 Mrs, Tang's method is not related to a scheme 
devised by Dr. W. A. Shewhart, "Correction of data for errors of measure—- 
ment," Bell System Technical Journal 5, 11-26, 1926. 
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Mrs. Tang uses the assumptitmn that o has the same value in all the M 
experiments. This is partly justified by the fact that all of them are 
carried out on the same large field, by the same staff and with varieties 
having many similar properties. The common value of o can then be 
estimated with great accuracy, being based on hundreds of degrees of 
freedom, This estimate, s‘, may be substituted in (7) for o, Next the 
observed values of x can be used to estimate the moments v1, V2, Va, V4. 
Together with s’ they will yield the estimates of Uz, Hoe, Us, U4e 
Finally, having obtained the u, Mrs, Tang uses them in a Pearson curve, 
which is then considered as an estimate of p(A). 


It is difficult to test the efficiency of this method theoretically. 
But Mrs. Tang tried an empirical test. She started with an arbitrarily 
selected distribution represented by the histogram in Fig.l on the next 
page. She considered the histograms as the true distributions of N 
values of A in some possible two experiments.. Next she used Tippett's 
numbers or Mahalanobis! table* to produce normal deviates of x, such 
as might have been produced by N experiments satisfying her assumptions, 
In a similar way she obtained M values of the estimate of the error 
variance, each corresponding to one hypothetical experiment. Having 
obtained those quasi empirical figures, she applied her method to estimate 
the distributions of A. Fig. 1 shows the results obtained. It is seen 
that the continuous curves do agree with the "true distributions" rep- 
resented by the histograms, 


DR. SARLE: I am wondering what you used for a check, 
DR. NEYMAN: I will explain it again, Let us assume that the 


true distribution of A is as follows: 


Value of A mt em eB ee mde OD. deine 2205.6 





Frequency ii Seat) Or ee Dee wees et ge 





It is seen here that one of the A is equal to -6, three others to -5, 
etc, Write down the A in one column, thus: 


Ay = -6 
bo = -5 
ba ead (8) 
ia, = -5 
As = -4 
etc. 


~— eee eee eee ee eel ee eae 


* P. C. Mahalanobis, Sankhya, The Indian Journal of Statistics 
(Calcutta) 1, 303-328, 1934. 
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Next take from the table of Mahalanobis (footnote page 79) the 
corresponding number of values of €; he tabled them so that they may 
be considered as values of ancemal variate about zero with unit stand- 
ard deviation, Suppose that you find 


€, = 0,03 a) 
Enis =11e 
Ez = -0,25 (9) 
Cal +55 
etc, 


Now add those numbers to your A; and you will obtain what might 
be given by experiments if the true o were unity and if the true A were 
distributed according to the above table, The results, 

X, = -6 + 0,03 = -5.97 | 

Xe SnD) ~ dyer eos Lt 

Kg 7 = O25, For .20 

4 = =) + 0,53 = -4.,.47° | 
etc, f 


(10) 


may now be used to estimate the distribution of x by the method of 
Mrs. Tang. Fig. 1 represents the results (page 60). 


You may have noticed that among the hypotheses of Mrs. Tang there 
is one that is doubtful. This is that the value of o is the same in all 
experiments, Actually, when dealing with the results of real experiments 
it was found that this hynothesis may not be true. So Mrs, Tang checked, 
again empirically, that her method is still applicable with o varying 
from one experiment to another within the limits that are Likely to be 
met in practice; see Fig. 2 on page 8% showing the results with varying 6, - 


Having thus obtained an indication that her method does lead to 
reasonable results, Mrs. Tang applied it backward to estimate the distri- 
bution of true sugar excess over the standard in a number of new varieties 
tested in 1923 and 1924, The varieties were produced and tested by the 
breeders K, Buszczynski & Sons, Ltd., of Warsaw, who kindly supplied the 
numerical data from their trials, Out of a considerable number of these 
trials, Mrs. Tang selected 40 carried out in 1923, and an equal number 
carried out in 1924, Those were convenient as they had the same number 
of replications, namely 5. In each of the two sets, 120 new varieties 
were compared with the standard in a systematic arrangement like this: 


yet. Oty Va Were Wyle Ve S.Vi%o VeSVy V2 V5 8 (11) 


To work those experiments out, i.e. to calculate the estimates 
x; of the sugar excesses A;, and the corresponding standard errors, 
Mrs. Tang applied the method of parabolic curves.* Next she estimated 
* See conference on randomized and systematic experiments, pp.51-52; 
reference in footnote, bottom page Sl. 
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the distribution of A, the true sugar excess, Fig.3 on page 84 gives 

the result referring to 1924, Here the histogram represents the observed 
distribution of x and the continuous curve the estimated distribution of 
A. 


Similar curves calculated by the breeder may give him various 
important information, which I shall classify under two headings. 


1. He may use such curves to analyze his method of selecting 
parent plants in step II of page 68. Having records of how he selected 
them a few years ago, he may usefully study what would be the distri- 
bution of A if he had made his selection differently, say breeding only 
half of the families that he actually took, This would have allowed 
him to make a stricter selection of his parent plants, taking only the 
very sweetest. Ignoring the new varieties bred from the parent plants 
that in such cases would have been discarded, and estimating the distri- 
bution of A for the remaining ones, the breeder would be able to see 
whether taking many parent plants and breeding many new varieties does 
represent a marked advantage, 


2. Having the estimated distributions of A corresponding to his 
actual experiments, and also to the stricter method of selection at 
step II, the breeder will be able to use the probabilities of errors of 
the second kind to see what would be the final results of his efforts 
including step V of page 68. Let us illustrate this for the estimated 
distribution of A given in Fig.3 page 84, 


The breeder is naturally interested in those varieties for which 
A> 0, called conventionally "good" varieties, Their proportion is 
represented by the area of the curve to the right of the origins of A 
(as in the curves of Figs.1, 2, 3). The breeder will be interested to 
know what proportion of these "good" varieties is likely to be detected 
as such by his field trials arranged according to this or that plan. 


Take any positive value of A within the range of the curve in 
Fig.5, calculate the corresponding value of p = A/o and use one of the 
graphs on page 61 to determine the probability of an error of the 
second kind, corresponding to the value of p and to the number of degrees 
of freedom considered for the trials, Subtract this vrobability from 
unity and you will obtain the approximate value of the proportion P(A) 
of good varieties that will be detected as such by the proposed trial, 


Calculate P(A't) for a number of successive values A’ of A, 
Next, take the estimated ordinate p(A') of the distribution of A in the 
population of your new varieties (as an example, the full line curve of 
Fig.3). This multiplied by 5A is approximately equal to the proportion 
of your varieties with A falling between A’ and A’ + 5A. Multiplying, 
you will get 
BUA) PUA) Sa 
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the proportion of the new varieties that (a) have their sugar excess 
V - S between A' and At + &A, and (b) will be detected as good 
varieties by the field trials planned. Fig.4 was made up in this way 
from the "good' varieties of Fig.3 on page 84, i,e,, it was made up 
from that part of the estimated distribution of A in Fig.3 lying to 
the right of the origin. The uppermost curve (a) of Fig.4 is simply 
the full line curve lying on the right of the origin in Fig.3. The 
dimensions are reduced so as to have the area under this part of the 
curve equal to unity: we are interested only in "good" varieties and 
in the proportion likely to be detected as such, In other words, the 
"zooa" varieties are here made the fundamental probability set of 
Lecture I. 


fll the lower curves (b and c) represent plotted values of the 
products p(A) P(A), where P(A) corresponds to @ = 0.01 or 0.05, and to 
different arrangements of the proposed experiments. It was assumed 
that all these experiments would be arranged in randomized blocks and 
differ only in the number of replications m, marked on each curve. The 
curves corresponding to @a@ = 0.05 end at the axis of ordinates at the 
point 0,05. The other curves, corresponding to @ = 0.01, have this 
ordinate equal to 0.01, 


The area under each curve represents the proportion of "good" 
varieties that will be recognized as such, for the given a@ and the given 
number of replications. Besides, the curves give the distribution of A 
for the "good" varieties that will be detected. You will see that if 
the stricter level of significance a = 0.01 is applied and the number 
m of replications is as small as 5, then the proportion of good varieties 
that will be detected is very small. You will find its value, 16.6 
percent, on the small table attached to Fig.4, page 86. This number, 
16.6 percent, is the area under the curve for qa = 0.05 and m= 5, 
divided by the area under the curve marked (a). On the other hand, if 
@ = 0.05, then the same proportion rises to 34.3 percent. If the number 
of replications is doubled, then the corresponding figures will be 31.9 
and 48.5 percent respectively. 


Apart from the proportion of "good" varieties iikely to be detect- 
ed, the breeder may be interested in the proportion of those for which 
the value of A is not merely positive but exceeds some arbitrary limit, 
say 0.2 percent of sugar, Such varieties may be termed conventionally 
the "best." There is no difficulty in calculating the proportions of 
the "best" varieties, the superiority of which over the standard would 
be detected by the trials. We have only to use the areas of all the 
curves to the right of the line A = 0.2. The corresponding figures are 
given in the two "best" columns of the table attached to Fig.4. For 
instance, in the table under "Probability of detecting a best variety," 
at a@ = 0.05 and m = 8, we see 0.696. This means that the area to the 
right of 0.2 percent under the curve for a = 0.05 and m = 8 is 0.696 of 
the area to the right of 0.2 percent under the curve marked (a). The area 
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DISTRIBUTIONS OF TRUE SUGAR EXCESS 
IN POPULATION OF VARIETIES TESTED. 


IN POPULATION OF VARIETIES FOUND SIGNIFICANT. AT OX =, 05 
IN POPULATION OF VARIETIES FOUND sda a ae AT a OTe 


m= NUMBER OF REPLICATIONS 










PROBABILITY | PROBABILITY 
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A GOOD VARIETY |A BEST VARIETY 
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to. the sient ot. O:, 4 percent. under the curve (a)is now ‘the’ fundamental 
‘probability set of Lecture tite 


Fig, ve te littie. table, and the method of ‘their construction, 


represent the main result. of the work of Mrs, Y. Tang. The breeder ic 


now starts 500 -new varieties each year, and replicates them only 5 times 
in his trials.may use her results to construct curves similar to those 

in Figs. 3 and 4 (pages 84 and 86), and compare the probable results of 
his work if the number of families started wére not 500, but perhaps 
400,300, 200, with a-corresponding increase in the number of replications. 


Having these..results before his eyes he will be able to take‘into account 


various economic factors and choose the most economical relation betwéen 
the number of replications and that of the new families started, 


‘I:might conclude here. But it seems advisable to warn the 
reader’ that the eetual process of seed breeding is a little more complex 


“than presented above.. In fact it is extremely difficult to include in 


formulas any process of more or less complicated practical work. Such 
is also the position in the present case. To give an idea of what I 
mean I may remind you of one thing I have already mentioned--new varieties 
are tested more than during one vegetative period and in more than one 
spot. It follows that the method as built up by Mrs. Tang refers to a 
Simplified case. But it is obvious also that showing how to calculate 
the probable results of only one series of field trials, when no such 
method existed before, she does contribute something to our technique, 
And even if this is not all that is needed, it is really a lot, be- 
cause the most difficult part of any problem consists in noticing that 
there is a problem at all and in advancing any sort of solution, There 
usually ave a lot of people able to introduce the necessary corrections 
and extenpions, 


DR. SARLE: What basis do we have for figuring the possibility 
of includinz some false good varieties in this area (pointing to Fig. 4)? 
Will all poor ones be eliminated by this process, or is there a chance 
of getting some of the poor ones? 


DR. NEYMAN: Fig. 4 refers only to those varieties that are 
really "good." The control of "false good" varieties is kept by choosing 
a proper level of significance, If you fix a = 0,00, then the chance of 
the best out of the "false good" varieties, those with A = 0, to be 


- passed-as goodwill be 0.05... On the other hand, the areas under se¢tions 


of the curves. in: Pig..4. give “the proportions of those varieties that are 


really “good,™: 


pose 


DR. SARLE: Your method. does automatically that? 


- DR. NEYMAN: Yes, in principle, but we must remember that the 
method gives only an estimate, which is always liable to error’, 


DR. SARLE: How does it know which one to pick out? 
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DR. NEYMAN: It doesn't, It would be a great thing if it could, 
All.it does is to estimate proportions, If you toss a fair penny.you* 
can never tell exactly when it will fall heads. On the other hand, you 
may ae ge say that, in the long run, the proportion of heads will be 
about <,. Similarly, no statistical meted is able to indicate which of 
the varieties with positive x is really "good" and which is "false, Sates 
On the other hand it is possible to estimate the proportion of those 
that are really "good" and also the proportion of their number which 
will be Pape tee as “good,” 


DR. SALMON: This means with five replications you actually. 
identify only a relatively small percentage of the total number of eace 
MAELOTA SR 


DR. NEYMAN: Yes, a very small percentage. But we must remember 
that the accuracy of experiments varies a great deal from year to year, 
owing to weather conditions, As a matter of fact, in the year 1923, 
which was also studied by Mrs. Tang, the proportion was found to be much 
greater than indicated here, 


* That is, if you toss it "fairly," which means to toss it so as to 
duplicate satisfactorily in a large number N of sets of n throws, the 
relative frequencies layed out by the binomial (q + p)", n being, for 
instance, 10, or any other convenient number, Cf, Lecture II, pp. 21-23 
in particular, Just how this tossing is to be done is an experimental 
matter, but we have confidence that it can be done, because it has been © 
accomplished in the past, 
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ON STATISTICAL METHODS IN SOCIAL AND ECONOMIC RESEARCH 
Census by Sampling and Other Problems 


A conference with Dr, Neyman in the auditorium of the. 
Department of Agriculture, 8th April 1937, 8 p.m., Dr. Frank 
M. Weida, Professor of Statistics at the George Washington 
University, presiding, 


I have received a number of questions for discussion at this 
meeting, and they lead me to believe that what I originally planned 
to talk about, namely some results of a particular research connected 
with systems of social health insurance, should be omitted, The 
questions asked are of extreme importance and I think they will take 
just the time that’is available. I tried to classify those questions, 
and I shall try to answer them in groups. But it may be that such a 
collective answer will not be sufficient and then you will please 
simply ask additional questions, 


There is a group of questions concerning the method of sampling; 
I think this is a very important question and I shall dwell some time 
on it, The typical question is how to get a good sample that will give 
sufficient data to estimate, say, the number of unemployed, the amount 
of money spent by the unemployed on certain kinds of commodities, and 
so on, There is a certain sum available for the inquiry and we have to 
decide what is the best way to use the money to obtain a good sample. 


One perticular question was asked referring to 300 cities, and 
the question was how to take a sample of them, It is suggested in this 
question that out of the 300 cities some 25 should be selected to 
' represent the whole and that in each of the cities selected an exhaus- 
tive inquiry (complete census) should be carried out. The enumeration 
of all the workers in these 25 cities, of all the unemployed,. the 
averages of moneys spent by them on various commodities, etc., should 
then be used to judge what happens in the 300 cities as a whole, I 
am expected to answer the question how best to select the Sune e GF 20 
cities that will be used for the above purpose, 


I shall not answer this question. Instead I am going to advise 
as strongly as I can to drop the proposed method of sampling altogether. 
It is most danzerous and is practically certain to lead to deplorable 
results. By this, of course, I do not mean that a successful inquiry 
by a sample is impossible. On the contrary, my opinion is that the 
sampling method may be most useful and may provide very accurate results, 
What I emphatically protest against is the selection of any 25 cities 
for a complete census (as of the whole population, or of the employed 
or the unemployed, etc.), with a total omission of the remaining 275 
cities. 


Broadly speaking, there are two essentially different methods of 
sampling that are used in social work, One is called the method of pur- 
posive selection, the other that of random sampling. This subdivision 





. ‘of which around 1000 or more are selected to forma sample; either en- 
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is a little artificial but owing to the fact that it is used in a 
special - report* on the method, presented to the eahcaamea inseam Statistical 
sibs alee is generally accepted. as ie 


The method consisting in a selection of 25 cities out of 300 of 
them and in limiting the investigation to those 25-cities only falls 
under the heading of "purposive selection." The mere question addressed 
to me, how those cities should be best selected, suggests that the- 
ealantion was not meant to be random, at least not entirely random, 
Usually it is suggested that the sample of the cities should be so se- 
lected that the averages of certain characters, called controls, cal- 
culated for the sample and for the universe should be in an as close 
agreement as possible, this circumstance justifies the term "purposive 
selection," ‘But it is not the limitation of the.randomness of sampling 
that makes the method dangerous. In fact, if it were only the question 
of random sampling, I could easily answer ft by saying that the best 
way of aghectlng, the 25; cities is to draw them at random, 


The trouble with the method lies in the fact that if we. try to 
select things (cities, districts, etc.) "purposely".the total number of 
such units that might be selected must necessarily be small, and there- 
fore the units themselves must be rather large, In your case you have ~ 
.600 units out of which only 25 are to be selected. Each unit of selection 
is a city inhabited probably by tens of thousands of people, possibly 1 
more and the differences between the units may be. enormous, This is a 
: rough description of thse method called "purposive selection,” 


The nomenclature "purposive selection" and "random sampling," is 
not very felicitous, as I have already indicated. It does not describe 
the essential difference between the two methods as they are applied in 
practice, The first method, that.of "purposive selection," consists in 
dividing the whole population into a comparatively few (say 300) large 
‘groups (e.g. cities) or units, of which some 20 or 30 are selected "pur- 
posely." The essential feature of the other method is that the same 
population is divided into a much larger number (say 100,000 or more) 
_gmall groups. fete. families, inhabitants of single houses, blocks, etc, ) 


tirely at réndom or at random with some restrictions. 


The first method is hopeless, the other extremely useful. | Those 
of you who would like to see thsoretical reasons for this opinion, will 
find them in an article of mine.** Here I will. give you an intuitive 
illustration of the ideas experienced there, Suppose we have a hundred 


_— meme ees 


* Ls Ae Bowioy's "On the precision, attained in sempling.» Bull. Int. 
Stat. Inst. 192.6, . Bat 


#* T Nevman: eee different aspects of the representative 
method," Journal of the Royal Stat. Soc. 97, 558-625, 1934, 


eo Naas 


dollars that We decide to use for gambling g, With fair play. If we 
divide the whole ‘sum into say five parts of $20 each and bet only five 
times, it is impossible to make any reliable prediction of what may be 
the. result. We may lose all our money, or. equally easy; we may double 
it. On the other hand, if we make a hundred bets at $1 each, then we 
may make some predictions with fair hope of success. The result of the 
game still remains uncertain, but it would be rather surprising if the 
sum won or lost exceeded around $20 The accuracy of the prediction 
would. be still greater if instead of making a 100 bets at $1 we would 
bet a dime 1000 times, 


Those are perfectly intuitive propositions and-you will notice 
that they have a definite bearing on. the problem of sampling human 
populations, The advice against selecting 25 cities out of the total 
of 300 is based not only on theoretical: considerations; some practical 
_ experience is available to show what might.be the result of an inquiry 
_if this-method is BPPLTOS. | 


In 1926 or 1927 two Italian statisticiatis, Gini and Galvani,* 
had to solve a problem of a kind that is exactly similar to the one 
contemplated here, They had to deal with the data of a general census, 
The data were worked out, a new cénsus was apbroaching and it was 
- necessary to clear the room for the new data; the old data were to be 
destroyed, but the statistical office considered it useful’to have a 
representative sample from these data of the previous census, because 
it seemed just possible that in the future some new problem would arise, 
and it would be convenient to have the material, Therefore they decided 
to take c sample from the old data that could represent the situation 
in the whole of Italy. 


They considered cerefully the problem of how to obtain e better 
sample, and took into account the report: to the International Statisti- 
eal Institute. After a certa in amount of discussion the Italian 
scientists decided to apply the method of purposive: selection, The 
whole of Italy was divided into 214 administrative districts called 
circondari. Those districts are large, some of them containing over a 
million inhabitants, Out of those they decided tu select a sample of 
29. You see, the size of their sample wes larger compared with the 
. universe, than in the case we were discussing, viz. 25 towns from a 
population of 300 towns, 


iy Various. overage s for each circondario had been calculated previ- 
ously. Gini and Gelvani sele -cted 12 characters of the circondart to 
serve as controls, and subdivided these controls into essential and 
secondary. ‘They tried ta select the 29 circondarfi so as to have the 


* Corrado Gini and Luizi Galvani:. "Di una applicazione del metoda 
rappresentativo all'ultimo censimento italiano della popolazione," 
Annali di Statistica, Serie vi, vol.4 (107 pazes), 1929. 
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means af the essential. controls calculated from the sample practically 

_ identical, with those: for the whole population, - They tried:also to 
..reach. a reasonable. agreement between the population and the sample means 
of the secondary controls...If you look .at the figures, you will find 

_.that the agreement between the means of all the controls in the sample 
and the means of the same controls in the population is very good. I 


-» don't know exactly what happened next... I have the impression that the 


statistical office discarded the rest "OF the. material and kept the 
sample. However that may be, the authors tried to ‘see whether the 
sample they had selected was a success and whether it showed satis- 
factory agreement. with the population also in other respects, besides 
in:the averages of. the controls, The:result was very bad. -They found 
; that .the distributions and also the correlations, in fact all characters 
-. except the means. of.. the controls,.as found in the sample, were in 
extreme disagreement with those in the population, The diagram on the 
next page, one of the many diagrams that the.Italian scientists pub- 
lished themselves, zives an idea of the disastrous results that are apt 
_ to: follow. the sampling of big-units. The proposed method (page 89) of 
<ailnsiitin 20 Cities out-of 300 is likely to. produce.a similar result. 
Tonine dapcovered, that their sample of 29 aL rRonnark. is not 
representative of.the whole population at all, the Italian statisticians 
expressed the opinion. that it is. generally impossible to obtain a sample 
that, as. it were, would reproduce re sampled population with all its 
properties, Strictly speaking, of course, they are.correct, There was 
in Italy in 1926. but one Marchese Marconi, the great. inventor in the 
field.of wireles s. telegraphy. Whatever the method of sampling, the pro- 
portion of Marconis in the sample could not:be equal to that in the 
population. ag we do not take samples to establish such proportions; 
and..both theory and experience indicate that whenever we have in mind 
‘any really hoo ani s problem of. estimating means of any size, 
regressions, etc., a sample properly drawn, is for all. practical pur- 
poses: sufficient, 


Now let us Connie what is to. be done to get a vel iahie sample. 
We must here rely on the. theory of probability and work: with great 
-numbers.,.."Great numbers" does: not mean. great numbers of people inciuded 
in the sample, but great numbers of random samplings, or great numbers 
of units that we draw separately. The sample of 25 cities,or that of 
29 circondart contain a great number of people, but from the point of 
view of.sampling theory. they are both small samples because they are 
composed of 25. or 29 units respectively, .For.a ‘a sample to be reliable 
the number cf units must be large, 


It follows that instead of dividing your population into 300 

‘parts, Gach ‘inhabiting a particular city, you’have to“ carry’the sub- 

division much: further, “Probably the best: thing would be to divide the 
whole into small groups inhabiting single houses or blocks, All those 
groups, which I shall call units of sampling, or simply units, must be 
listed, and this necessity of listing usually provides a limit to the 
tendency of having the units as small as possible. When this is ‘done, 
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you will be able to select A random -stimpie--of.the . units, which may be — 
ohe = twelfth of the wholé, like in the contemplated sample of 25 Gita 
out of 300, but probably could be much smaller, without pny serious 
detriment to the accuracy. ge + 


oy 
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The process of random pene, may be ‘of various forns, which 
‘are not indifferent from the point of view of accuracy of the results, 
The first attempt at a serious study .of the relation between the method 
of sampling and the accurecy of the results was made by Bowley and is 
described in his report: to the International Statistical Institute 
already mentioned, The main ‘results are as follows, . 


The sampling is called unrestricted if at each drawing each of 
the elements forming the population studied-has the seme chance of being 
drawn, To illustrate this idea I shall point out that in the case of ~ 
ths population formed by the inhabitants of the 300 cities, an unre- 4 
stricted sampling combined with bad luck can produce a sample composed ~ 
only of slements from 25 cities with the complete omission of the others, 
This, however, is extremely unlikely. 


More accurate results could be obtained by what Bowley calls 
stratified, and what I call stratified proportional sampling. This con= 
sists in a two-fold subdivision of the population studied. We first 
divide it into a conveniently great number of larger parts, called 

strata, Those may be your 300 cities or some 600 halves of the cities, — 
etc. Next, each stratum is divided into units of sampling. If it is 
decided -to work with the sample of one-twelfth, then you select at 
random one-twelfth of the units out of each stratum separately. This 
makes it impossible for the sample to be devoid of the units repree- 
sentative to larger sections of the population, 


It is obvious--and this presumption is supported by theory-—~. 
that the more homogeneous the single strata, the better the effect of 
stratification, Therefore, if a city is divided into two or more parts, 
one inhabited by the well-to-do, the other by poorer people, still 
another being a shopping district, ete,, all those parts should be 
treated as ‘ecanamtts vase strata, 


It may be useful to emphasize here that homogeneity of a stratum — 
does not necessarily mean equality or similarity of all the people in- 
habiting or forming a stratum, In fact homogeneity of a stratum or of 
a population means a comparative similarity of the units of sampling. 
If the population of a town is composed of representatives of 10 differ- 
ent races all in the same proportions, then probably we should say that 
from the racial point of view that population is very heterogeneous, 
However, from the point of view of sampling it will be ideally homo- 
geneous if it happens that the racial composition of any of its units 
is exactly the same as that of the whole population, It is ssen that 
the internal heterogeneity of sampling units goes together with an ex- 
ternal homogeneity of those units within the population, This: isa 


general rule, iy 8 
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It follows that the choice of the units of sampling of a fixed 
size is not indifferent from the:point of view of the accuracy of an 
investigation by sample. Mr. Frederick F. Stephan tells me that an 
investigation has shown a greater. similarity between the inhabitants 
of two sides of one street than between those of the opposite sides of 
the same block, It follows from what I said that if it were contemplated 
to divide the population into units of sampling alternatively composed 
of the inhabitants of the two sides of sections of single streets or of 
the two sides of single blocks, the latter method would give more homo- 
geneous units and therefore .a greater accuracy of sampling. 


The gain in accuracy due to stratification is considerable, but 


.. it is possible to go bevond what was advised by Bowley. A cursory 


glance at the situation suggests that the rule of selecting randomly 
the same proportion of units out of each stratum may not be the best 


procedure imaginable. You could not expect thet all the strata will 


be internally equally homozeneous, To make the situation clear, suppose 
that one of the strata, A, is idcally homozeneous, while some other, B, 
is fairly heterogeneous, Then, to know all about the. stratum A it will 
be sufficient to take out of it one unit of sampling only. On the 

other hand, an accurate estimate of the properties of B would require 

a sample of considerable size. If we decide to sample both A and B in 
proportion to their sizes(= the number of elements of sampling they 
contain), then we shall "oversample" A and “undersample" B, This 
intuitive reasoning could be put into exact form* and the result is as 
follows, 2 


Denote by U the average referring to the whole population that 
it is desired to estimate from the sample with the greatest possible 
accuracy. Suppose for example that U is the average income per family 
of the unemployed, Denote by s the total number of strata, by M; the 
total number of units of sampling forming the ith stratum, by ni j the 
number of unemployed families within the jth unit of sampling belonging 
to the ith stratum, and finally by ujj the total of the incomes of 
those njj families, With this notation we ‘shall have 





; s Mi My Ss Mj / 
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* See my article in the Journal of the Royal Statistical Society 
already referred to on page 90. 
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and denote by m4} the number of sampling units that we decide Be select 
at random from the Mi possible units in the ith stratum, 


It is Sueur to show, nat the greatest accuracy in estimatin 
the numerator in formula (1) is attained when the numbers m; of eleme: 
actually drawn in the sample from the M; possible sampling units in t 
ith stratum are proportional to the products Mjoj, that is, when 


Mio; . : 
2, Myo 


so that the ratio m;/Mjoi is constant for all the strata, i=l, 2, — 


I have denoted above by Uj1, Uiz, «.-, Uim,; the totals of incom 


of the families of the unemployed belonging to. particular sempling units 
within the ith stratum, Denote now by x41, Xie, +--+, Xim; those of the 


jj that correspond to units actually included in the sample of mj unite 
pa from the ith stratum, and let xy be their mean, I1f the ee 01 
in (1) is known, which is. frequently the case, then the average U willy 
be estimated a 


%:: es 7 ic, 
K = (1/N). X Myx; . ae 
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where : 
. Bs the denominator | 7 
Nt Fy > oon equi) (am 
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is the total number of the families of the unemployed within the popu- : 
lation studied, The squared standard error of X is given by 
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are the total numbers of sampling tia forming the o whole populetion ; 

and the whole sample respectively. | 
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' It will be seen that-of the three terms 
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on the numbers m; of the units of sampling selected from particular strata. 

Therefore, any change in these numbers may influence only this term, 

which has the minimum value of zero Whenever the m; satisfy the condition 

heel | ie 
. és V/ 

If the denominator N in Bq. (1) is not known, then U will be x 

estimated by a ratio of estimates of the numerator and the denominator 

separately. Some of the questions addressed to me refer to this situ- 

ation, The estimate of accuracy of the estimate of U now involves the 

applications of some remarkable theorems of S. Bernstein* and of 

R. C,. Geary,** but these matters are a little too complicated to de- 

scribe here, and I shall have to refer those interested to my article 

in the Journal of the Royal Statistical Society already mentioned, page 90, 


The adjustment of the numbers of samplings to be carried out with- 
in a single stratum is particularly important whenever we know from some 
a priori grounds that certain strata are more heterogeneous than some 
others. Such will be the case if some of your strata are cities with 
very mixed populations not permitting partition into more homogeneous 
parts, while other strata are uniform agricultural districts. In the 
light of formula (7}, the purpose of stratification becomes now a little 
different from what it wes before, Previously, it consisted only in 
getting strata as homogeneous internally as possible, though differing 
between themselves, Now we have an additional means of increasing the 
accuracy of the results by isolating into separate strata such parts of 
the population as are heterogeneous and by sampling them more heavily 
than the others, 

ed 


It is obvious that the adjustment of the my to’conform to Eq.(4) 
requires knowledge of oj. Those are never known before the investi- 
gation, otherwise the investigation itself would be unnecessary. The 
difficulty may be overcome in many ways: 


1, When, apart from-intuitive feeling, we have absolutely no 
previous knowledge concerning the variability of particular strata, it 
is useful to divide the inquiry planned into two parts: (a) preliminary 
investigation, and (ob) the main investigation, The preliminary in- 
vestigation consists in drawing at random very small samples, of 

= 20 or 30, out of each stratum and using them to estimate the 04, 
If the estimate of oj is denoted by s;, then the total number my of 
samplings intended should be divided between the strata in Whoa 
to Mjsj so that ig ih 


Ms a+ 
TS eo Bh aca (compare with Eq.4) (4a) 
2 Mysj 
The main investigation would consist in selecting an additional m,-n units 


ae 7 * . ‘ 
+ ee: Bernstein, "Sur l'extension du theoreme limite’ du calcul des 


' probabilités, " WMathematische Annalen 9%, (Ar S9, 1926, 


*t RP. 0. Geary, "The: frequency distribution of tlhe quotient of two 
normal variates," J, Royal Stat, Soc, 93, 442-446, 1950, 


out of each skratrings ate will be homembaned nae the Oaua ieee inquiry 
will be useful also ftom” the point of view of training the enumerators, 
Let me emphasize that the sample of the preliminary inquiry need not be 
large. . inthis: respect: ‘seo P. Vs Sukhatme: "Contribution to the theory — 
of the representative method, ws PRs ‘Stat. Soc. mdemaprit Vol. Tr, 

Beene pp. 253-268, a. RT aa gy are 


binge pi danaath ys we: may have’ some previous hie Tstdi of the strata 
‘considered. For example, in the process of working. ‘out: the data of some 
previous general census, “certain characters of -the. same “units of sampling 
may have -been cateulated: Alternatively, the data for: such calculations 
‘may be available, Thosé might not concern the eharacter U, but it will 
‘be. sufficient: if wehave’ information concerning the: vartatd ity of the 
sampling units with respect to sone. character, say v, correlated with 
U.. An adjustment of the’ numbers of samples. aecording to the variability 
of v.will be more or less equivalent. to. the. Hs ten de bape according 
to oe a 


MR. STOCK: : Te you were measuring a aR of characteristics, 
_ which one would you tie 0 ae ve OP hans 


DR. NEYMAN: I welcome this question, It is true that we practi- 
cally never plan an inquiry in order to determine just one single mean, 
But usually it is not difficult to see that one of them is of greater 
importance than the others, If such be the ease, then the numbers of 
samplings should: be adjusted accordingly.. Alternatively, if there are 
several characteristics of equal importance, we may look for one that. 
could be called the basic characteristic, and which would have the. 
property. of being correlated with the ones that we are: ‘interested in. 
This correlation’ may be positive or negative, but the resulting corre-- 
lation between the’ corresponding: 0; will always be positive. Therefore 
an adjustment according to. the basi¢ stg Aah h will always tend, to 
papen te: the ED ae ah S 


DR. WELCOX:: If you had been reerer em the Italian census people, 
“what specific advice would you have given? 


DR. NEYMAN: I would have advised them to consider their 
ecircondarit not as units of sampling. but. as strata, These strata should 
be subdivided: into units of sampling as small as the character of the 
material would permit--parishes, streets, single houses; whatever is 
possible. As a matter of fact I remember secing.a-footnote in’Gini~ 
and Galvani's paper in which they themselves auegest that «probably their 
‘results would be more satisfactory if instead of sampling: i eae rate 
“they would sample parishes, ‘In this of course they.are. ‘perfectly | a 
correct, The results: would have been much better ariel hosed a pi hed sampled 
proportionately to the sizes of the’ strata, 4 Db x 

There is a ealbades difficulty in carrying et an senna based 
on a random sample, which seems to be worth while mentioning. — This is 


ae 
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of a psychological nature. Generally we do not rely on random sampling, 
Intuitively we are inclined to think. that. it is not wise to rely on 
chance, when there is some knowledge available that might guide our 
steps. I have seen many instances where a feeling of a similer kind . 
has made it difficult to reach a decision on how an inquiry should be 
carried out, I remember very well the doubts that I had myself. "That's 
all right in theory," I thought, "but how would this random sampling 
work in practice?" Then a great discovery satisfied,me how to make up 
my mind; and since that discovery has worked well with other people, I 
shall biti Ba it to you, It consists in a simple rule; try it and see, 
So far as ‘our intuitive fecling against some theoretical result is ¢on- 
cerned, there is nothing like an experiment. In.the case of a planned 
inquiry by sampling, and the question of how to sample, I would take 
something like 1000 sheets from census data or the like, consider them 
as a sampled population and perform on them in detail all the steps of 
the several alternative methods of sampling that are contemplated. But 
I must add ai few warnings, 


(at The: popuration in this experimental sampling must be suf-. 
ficiently heterogeneous, like the populations that we study in practice, © 


(b) The size of the random sample you draw in experimental 
study must contain a sufficient ‘number of units, say 80 or 100, 


I am-¢eertain'that a few trials of this sort will appeal to your 
intuition and will give you a comfortable feeling of safety in random 
sampling, in spite of the fact that in sampling randomly you will 
sometimes ignore knowledge of certain principles, But it must be re- 
membered that-in following the indications of the theory you will make 
use of some other kind of knowledge, that of mathematical statistics, 


DR. WILCOX: I have read that article of Dr. Neyman's (footnote 
page 90) and I noticed that he spoke of his work in connection with the 
Polish census at the same time that he was commenting on the work in i 
Italy on the Italian census; therefore I asked him the question if he 
had been the adviser for the Italian census, in view of the fact ~ that 
their cards were stacked, and of the difficulty of rearranging them, 
what advice he would have given. I would like to ask a question that 
is somewhat related to this matter of drawing the sample. It is fairly 
common practice to take a list of the elements of sampling and to start 
with:one that is’ selected by some device or other and then take every 
tenth or twentieth on down the list and make up the sample that way in- 
stead of setting-up a set of random numbers or drawing numbers at random 
and selecting the sample according to that little model or game of 
chance, Are there any advantages or disadvantages that one should bear 
in mind in making use of the device of taking every tenth name on the 
list, every tenth family, house or district? 


DR. NEYMAN; I think there is a definite advantage of using a 
mechanical .process of random sampling throughout; that is to say, not 
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taking every tenth unit as listed, Sometimes it will not improve any-. 
thing,and your tenth or twentieth house will be as’ good. But there is: ~ 
just the possibility, especially in new properly planned. -towns,.,that 

if you start with every twentieth or fifteenth house you will be syn-.- 
chronized with something very essential in the town itself. We know 

of one small inquiry where théy took a sample of houses in a, few 
villages. The houses were numbered and they decided. to take every - 

fifth or every tenth, perhaps, and they obtained something very surprising, 
Eventually they found out that the first: house always was the one be- 
longing to.the squire. In new towns you:can expect that every block 
will have the same number of houses, and if you take every fifth house, 
you may either omit corners or suai heriectet esd Ss include all of then 

the sample, and aces may introduce a considerable bias, 


It is essential to be clear about the exact nature of the pro- 
cedure suggested, It is this, We take ten first units of sampling as 
listed, and select one of them at random, Let x be its order number,: 
Then to form the sample we take the units numbered x, xt10, ZX?#Z0, asses 
etc, It will be seen that this procedure is equivalent to a division 
of the population sampled into 10 parts, thus: . 
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Next we treat those parts as units of sampling and take. only one of them 
to form a eemple. 


Obviously, if we proceed in this way we a not rely on the 
theory of probability but only on good luck, hoping that. the ten parts 
into which the whole population is divided are very similar from one:to 
ahother, I would recommend that one rely on chance as governed by the 
empirical law of big numbers 5! but I: would now i cod ae one-rely 
on good luck, 


As a matter of fact, there are no special aifficulties in pore 
randomly. There is‘a very useful Little book of Tippett's* Random 
Sampling Numbers*- which may be recommended for the purpose. © ify your 
sampling units are listed and numbered, to take a random sample of them 
you simply open the book and read in turn a sufficient RAY of numbers, 
Whenever the same number: appears twice you simply ignore it... xou ignore 
-also all numbers exceeding hi idiheluatel of your’ plaids ebyyy UAE OR ts 


s 
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* See footnote on page 14, 
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DR, LANG: I don't see how this system should be applied concerning 
names that are inn Neca aumiaiablaniiess! 


DR. NEWMAN: ct ee using - Pippett’ Random Sampling Numbers you 
will. have to. number all your names. | ' 


In: regard: to the question. just. discussed, it may be useful to 
mention that .in many cases every tenth house will give as good a sample 
‘asthe application of Tippett's numbers, Other methods may be used 
“also, It is very. difficult to give a general rule for distinguishing 


'. between reasonable precautions to insure randomness, and the attempts 


- to “cut a hair in two along its. length." Here the research worker must 


-° acquire ‘soiie experience and use his own judgment in every practical 


case, It must be emphasized, however, that the use of Tippett's numbers 
oe not present any difficulty at all, and that using them you are on 
the safe side. 


MR. WERTHEIMER:- In Dr; Gini's work did ‘he tisually make only the 
mean conform? 


; As 


DR. “NEYMAN: ‘He. ‘ised 12: pontrole. 


MR, WERTHSIMER: what would he, use hy make the samples, would he 


suse the: averages,’ the average’ only?. - 


Aimee? 


DR. NEYMAN: The averages of 12 controls, 
QUESTION: . Was it ever tried making two characters the same? 


DR. NEYMAN: “ Yes, by Professor Anderson in Bulzaria.* They 
sampled villages, From a previous. census they got distributions of 
‘various charactsrs ‘of farms within the villages, For each village they 
constructe -d the histograms and.then they tried to select such villages 
‘to form the -gample for which the histograms of. several. controls were 
similar to those for the whole population, I think it is again a faulty 
method, but there is no evidence of what happened, I don't know whether 
any comparison bétween the characters of the sample that were not used for 
its selection and ‘the 1 alae malate ones of the population was ever 
published, 


“MR, KANTOR: Suppose thet you have to sample the workers in 
various industries in’several states or other geographical.areas, 
You do not have any record. of the unemployed, and you want .a sample 
that will sive you the percentage of unemployed.in each industry .for 
each of the areas, The reason for having the different areas is that 


there” may be’ economic factors that affect the unemployment rate where 


“there is a small part ofthe’ industry, contrasted with: the case where 
there isa major center- of it, or where there is. diversified or unified 
industry. How can one ‘<0 aineat. getting: a sample that would give results 
equally. accurate for each: endustry within each ite Salas 


an ute “Oskar. Ne inderson, ‘Binsihrung shay die linthamat isohe statistik 
ui ise Springer, 1935) Venuty © Pi 
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_. DR. NEYMAN: There is no particular difficulty in approaching the 
ideal of equally accurate estimates for different areas concerning the ~ 
same industry, but it may be impossible to attain in addition a. similar 
equality in accuracy for all industriés, Your situation is more com- 
plicated than those déohsidered. before, The different areas you men- 
tion must be cotisidered as separate fopulations--let us call them 
_ partial populations, They may be and should be stratified. Denote by 

mo(i) the total number of sampling units to be selected from the ith 
partial population. This number should be distributed among the strata 
according to the rule I have indicated before (page 96). If this-is 
done, then the variance of a mean like the one in formula (5) page 96, 
but new referring to the ith partial. population, will be, lps p to 
formula (7), 


a Molt} molt) 
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where the notation of Eq.(7) page 96‘has been altered so that the i in 
~ parenthesis’ refers to the ith partial population, and the subscript j 
to the jth stratum. If, as formerly, we denote by m, the total number 
of sampling units to be selected from all the partial populations, 
then this number of samplings must be distributed between those popu- 
lations so as to-keep:o(i) in formula (7a) constant. .This, of course, 
‘refers to one industry only, and assumes knowledge of the o3(4) for 
each stratum and for each partial population, The values of 0j(i) 
could be estimated from-a. PRO EAE, inquiry. 


“MR. KANTOR: In attempting to get an estimate of the variability 
that we are going to use in deciding the proportion that you will draw, 
you will have to take e test count in each of your areas; you have it 
scattered over a number of characteristics, it is no longer one charac- 
teristic that you measure, You would have to get e test. drawing and 
- compute actual unemployment rates for a number of suninteies in each of 
your areas, Isn't that the only way in which you can proceed with many 
he aha ell It seems to me that you have to take a full count. 


DR.. NEYMAN: I don't think so, The preliminary inquiry designed 
to estimate the variability of the strata may be very small in size, 
As I have already mentioned (page 98), Dr, Sukhatme has investigated 
‘this question and found that 20-30 units of sampling out of each stratum 
would be plenty. He suggests even as few as 15. And it is not necessary 
to.make a preliminary inquiry separately for each industry. You'make one — 
and use it to estimate oj(i), for each of thé industries~in turn, Substi- 
tute your estimates.for the true oj(i) in formulas (4) and AC) _sepa- 
rately for each industry,” You “wid see that formula: Aas Ana: Heating the 
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optimum proportions of samplings within strata, will give more or less 
similar results for all industries, Probably--but of this I am less 
‘certain--the samé thing will happen -with ‘formula (7a). Alternatively 


; you may adjust your proportions of sampling to some single character 


treated as basic, I should choose the total number of workers within 
the sampling unit; it is likely to’be highly correlated with the numbers 


of acm toyed: 


MR. “KANTOR: ‘We find, however, in hondan aa ‘that. there are. very 
great ‘dif ferences in:the pacndneion inemployea, depending on the -pro- 
duction rate of thé industry to which the let eco were attached, During 


a depression, ‘the production of goods for :use:in further production 


declines very rapidly,’ but. the production of articles made for general 
consumption declines only slightly; an area devoted principally to the 
former type of production will have very high unemployment, and en area 
largely devotéd to the’ latter type of production will have small un- 
employment. Is: it that’ variability: that. we can test by drawing a small 
preliminary sample? ; 


DR, NEYMAN:- Yes,:that is quite all right. The variability ycu 
speak of does not ‘cause: any trouble since this ig a variability between 


the strata, or perhaps between the partial populations, I presume. that 


the distribution of industries over the cauntry is more or less known, 
and that when stratifying you will be able to distinguish areas differ- 
ing in the general character of the prevailing industries. My impression 


' even is that it is partly. the purpose of: your:study to get information 


concerning such areas separately. If you look closely into my formulas, 
you will notice that they depend upon the variability within the. partial 
population and, moré particularly, within-the strata. Denote by n, for 


the moment, the number of workers within a unit of sampling, and by x 


the n ine ae emetewLously working in some particular industry and 
now unemployéd.- If you take one particular stratum and study the units 


of sampling, are may’ find a picture agmeth ene Like this: 
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We have sven ‘something of that sort in an actuaél‘inquiry in Poland. The 
total number of workers within the boundary of a unit of’ sampling is com- 
posed of various kinds of these workers, and n is bound to be correlated 
with x. Of course this correlation is hagas due to the varying size 

of the sampling unit. e 


“The plan. to take but one basic character as a unit Bae the advan» 
tage that in using it you can tell if the preliminary inquiry might be 
made very superficial and yet be satisfactory; the enumerators might 
then be asked to establish on ily the number of workers inhabiting the 
units of sampling, But this procedure has also definite 


disadvantages. First of all if you work only with the basic character, 
the data collected durins the préliminary inquiry could not be included 
into the main one. Next the basic character is very likely.to be use- 
ful for assigning the numbers’ of sampling to separate strata of one 
partial population, but I am not so certain whether this will be the case 
when you try to determine the level of sampling of partial populations, 
Therefore I should probably carry out the preliminary inquiry exactly 
as the main one, with the only difference in size. I would estimate 
each o(i) separately for each industry and substitute it into formu- 
las(4a) and (7a). Then I would see what happens and what would be the 
accuracies of the average that I would siacspien by this or that system one 
the m(i). 


MR. FRIEDMAN: In many cases the set of characteristics that it 
is desired to study includes some about which information can be obtained 
with relative ease and others about which information can be secured only 
through long and expensive interviews; In such cases it may te advisable 
to secure information on the first set of characteristics from a large 
random.sample, This information may then be used to select a smaller 
stratified sample from which the second type of data:can be secured, 
From the random sample would also be obtained weizhts to be used in com- 
bining the data from the various strata of the stratified sample, 


. Thus, in the Study of Consumer Purchases, which is now being con- 
ducted under’ the auspices of the National Resources Committee, the 
Bureau of Labor Statistics, and the Bureau of Home Economics, the primary 
aim is to secure information on family expenditures, The send from 
which , such dataare secured is, however, stratified with respect. to income 
(as well as other characteristics) , - At the same time, there are no data 
on the relative frequencies of the different income: classes, As a 
consequence, it was necessary to obtain information on income from a. 
random sample of families in order to secure the weights for combining 
the data from the stratified sample. In view of the extremely high 
costs involved in securing the data on expenditures, and of the rela- 
tively low costs of securing the data on incomes, it was decided to make 
the random sample from which income information was obtained very much 
larger than the stratified sample giving the information on expenditures, 


The question I should like’ to ask is whether any work hag been 
done that would indicate the. optimum relative size of the two samples 
on the assumption that the PHLSEI TS costs and the relevant standard 
deviations are reysceid . 


rae OR NEYMAN: So farias. I know, nothing has been done on the 
specific question that’ you raise, -I take it, however, that in such a 
case it would be necessary to conduct two preliminary inquiries, one 
designed to determine the relative frequency of the different income 
_@lasses, and’ the other to determine the standard. deviations for the item 
in which you are particularly interested, for the different strata, The 
second preliminary Reh gi ahigmr si wicabovwins as I have. already indicated, 
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need to cover only a relatively small number of cases, 


‘MR: FRIEDMAN'S QUESTION RESTATED BY Dn. WILCOX: For part of the 

' work at least one step wag taken in trying to ae a random sample using 

ce every : nth card, starting. not.with the first ecard but with the card 
: which itself: Ce be the result of accident, This was the process of 
finding out for a. given city what proportion of the people are wage 
earners and clerical workers and what proportion are at one or another 
‘income lével That was an inexpensive survey. Then the long laborious 
process had to be followed: of-finding out how they spent their money in 
detail, and the number of families that mizht respond to the more 

| SVAbOrate questionnaire might have no very close relationship to the 
number of familiss in the particular type of occupational activity or 
income level. And so the question of weights comes around, What should 
be the: relative numbér that, would be. secured on. the random basis; should 
we take every tenth family, or knowing in advance, approximately the costs 
of the operations and. therefore how many schedules we are going to be 
able to get on the expenditure basis, how heavy a sample should we have 
taken on the random basis? What is.the relative size of the random 
sample, of the larger to the smaller? ; 


DR. NEYMAN:. I repeat; so far as I am aware, the question asked 
has’ not: been considéred; but.it-:is so interesting that I shall be glad 
to see whether it could be answered by some gimple method, If I: succeed 
ih will eects aes to publish the results,* : 


v 


The other group of questions that I was asked to answer refers 
to standard economic probiems,... The persons. who asked those questions 
aS were interested to know how far statistical methods could be used to 

illustrate some connections between different economic factors develop- 
_ing in time. We know that there are cycles, We know that there are 
“inter-relations between movements of prices, of unemployed, and so on, 
ALL those factors are developing in time, and if we plot the figures, 
we" ‘observe some parallel movements, ar movements in opposite directions, 
ana it is a question how far statistical technic and in particular the 

_ theory of testing vena or the theory of sampling, is applicable 
to such observations, 


* Editor' S mores. Ge the summer of. 1937, br, Neyman prepared a 
paper dealin: with the ‘question raised by Mr, Friedman and Dr. Wilcox, 
and it is slated to appear in the March 1938 issue of the Journal of 
the American Statistical Association, A paper by Milton Friedman, 

_ reporting the survey of family expenditures, along with a new metho 
of handling’ ranked data, will. appear in the issue, for December 1957, 
under the title of."The use of, ranks to avoid. ‘the assumption: of 
normality implicit in the analysis of variance,” 
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My opinion is that the phenomena described could and should be 
studied statistically, but that the appropriate methods have not yet 
been properly worked out; The procedure that is ordinarily epplied Beems 
to be wrong. Many people feel this sub-consciously, and this is. proba- 
bly the reason why similar questions are being asked so frequently. 


Let us make a Short review of the methods that have been applied 
in studying the correlations between two time series, 


We start by trying to split each of the series into several parts, 
which we arbitrarily assume to be additive. One. of these parts is the 
trend, which we estimate perhaps by fitting a low order parabola to the 
whole series available, The next part is the "business cycle!" The 
third part is thé "seasonal variation," which we frequently estimate by 
calculating moving averages. Finally, the remainder is considered to 
arise from random causes, and we concentrate on the question whether 
such a remainder in one of the variables is correlated with that in. 
some other, 


All this procedure seems to me very artificial and arbitrary. 
How do we know that the trend, the business cycle,: and the rest, are 
connected together additively? Why should all the systematic variation, 
everything except the random component, be represented by smooth curves? 
Finally, even if all the. hypotheses’ described were true, we must notice 
that the residuals calculated in the above manner are not equivalent to 
direct observations but are in some way or other related, Consequently, 
if they are used for testing hypotheses, some novel methods, not yet 
available, must be devised, This, however, does not seem to be a 
fruitful field of research, 


In my opinion the whole problem of time series must be treated 
‘from a point of View that is quite different from the traditional one 
ua described. As a matter of fact, we are already witnessing some 
attempts on these new lines. Work is being carried on ‘in two directions, 
One direction is represented by the authors who try to formulate in- 
telligible hypotheses concerning the machinery of economic _Paehomers 
and to express them in terms of either differential equations or 

. equations in finite differences, Solving those equations they a btain: 
measures of various interacting economic factors as functions of time, 
‘The corresponding curves (see, for instance, diagram I page 116, in the 
conference on time series), resemble im some respects the ones that may 
be given by observation, but there is one essential characteristic in 
which there is a distinction between that theory and the observations: 
the theoreti¢al ¢urvés are too regular. And this is. just the circum- 
stance indicating the need for a new branch of mathematical statistics 
that must be developed for treating time series, The excessive reguiari-s. 
ty in the theoretical curves is an obvious result of the circumstance 
that there is nothing variable in conditions from which they are deduced, 
Now in actual life, the rizid economic hypotheses that have led‘to the 
ordinary differential equations may be satisfied approximately but not 
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exactly, It may be, for example,:that the purchases of the rural popu- 

lation in each year are; as a4 rule, nearly proportional to the consump- 

tion of agricultural produces in the preceding year; but it is obviously 
impgssible to expect that the coefficient of proportionality ps be an 

absolute constant, 


; “It follows that if we desire a better agreement between obser~ 
vations and the dynamic theory of economics, the theory must be based 
on differential equations or equations in finite differences of a 
special kind: the cnefficients in these equations, at least some of 
them, must not be rigid constants but random variables, The solutions 
of such a system of equations would -not ‘be represented by definite curves, 
even if the initial conditions were fixed. Instead, for any fixed 
moment t, we should have a probability distribution of the variables 
concerned, depending on the values that they had in previous moments, 
Having a system of dynamic hypotheses, the solution of the random differ- 
ential equations resulting from them,-and the observational data, we 
could test whether the agreement is satisfactory. This is, in my opinion, 
the right way of treating time series, 


Unfortunately the theory of random equations is not yet ready or, 
more precisely, it is enly just being started; and this is the other 
direction in which the theoretical work concerning time series is being 
carried on, 


MR. STOCK: Would yeu give a reference on this last subject you 
have been talking of? 


DR. NEYMAN: To my knowledge, the first problem of this kind was 
treated by Professor Hotelling,.* It consisted in finding the most proba- 
ble size of a population at a moment between two censuses, Seeing that 
the most difficult problem in scientific research consists in noticing 
that there is a problem at all and in formulating it properly, I think 
that this paper is a very brilliant achievement of its author, 


The theory of random equations has been discussed by S. Bernstein* 
in his report to the International Mathematical Congress in Zurich 1932, 
and is a series of papers published in the Proceedings of the Leningrad 
Academy. Some of them are in French, others in Russian with extensive 
French summaries. 
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* The reader is now referred to the conference on time series, The 
reference to Hotelling's and Bernstein's papers will be found on 
page 119, 
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MR. STOCK! In regard to the first seties of questions, is there — 
any way of estimating the effective. Size of the sample in order that we 
may estimate a standard error from'these stratified samples? If we have 
in our sample a thousand persons, but we 7 per egg me asiiny 500 houses, 
what is the effective size of that?.~ = * apa 






DR. NEYMAN: The size of the sample is not the anes of persons, 
but. ‘the number of random selections made to form the sample. Therefore, 
if you select houses at random so'that your unit of sampling is a:house, ~ 
then od is the number of houses that is relevant. If towns are selected, 
only 25, and those ‘25 towns contain say 25, Oy, O00O-people, then the. size 
of the Eanpte ye be 25 2, HOE 29,000,000, 


In order to estimate the effective sizé of the sample éisiaclal 
to get a given accuracy of the estimated means you may apply the formula 
fot the variance o® of X as given in Eq.(7) page 96. Of course you will 
want the estimates of 0;; those could be found from the preliminary: — 
seul ry 5 babe ; Le 
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TIME SERIES ANALYSIS AND SOME RELATED STATISTICAL PROBLEMS’ IN ECONOMICS 
A conference with Dr, Neyman in the auditorium of the Department 


of Agriculture, 10th April 1937, 11 a.m., Dr. Charles F, Sarle 
presiding, 

























I shall speak of two different ways of treating economic problems by 
“means of statistical methods, One could be called empirical, another a priori, 
The first is very popular, has yielded many important results in the past, and 
is likely to be useful for a considerable time in the future. The other method 
is much less popular and so far has not proved very efficient, Still, I shall 
rather condemn the empirical method and praise the a priori, Both methods are 
> designed for the same purpose: to make predictions concerning economic processes 
as described by various figures such as prices, incomes, supply, and demand, 
Their distinctive characteristics are as follows, 





Pa 


i ; With the empirical method the dominant hypotheses concern the final result 
| of the work of the economic machinery in any given situation, without paying at- 
| tention to the machinery itself, On the other hand, treating the problem 

'_ a priori, we start by formulating some hypotheses about the machinery, and then 
work out the results that can be produced from those hypotheses, 


Before giving examples taken from the current. literature and criticising 
the two methods of approach, I shall try to emphasize the distinction between 
them, by quoting certain instances from the history of astronomy, which, you 
know, could be roughly divided into two phases, before Newton and after Newton, 
with a transitional period marked by the names of Copernicus and Kepler. 


All, or almost all, the work done in the first phase was independent of 

' any hypothesis concerning the machinery of movements of celestial bodies, On 
the other hand the authors firmly believed that, whatever this machinery may be, 
| it must have produced only circular movements, The cireles could have been 

I stationary with their centres at the Earth, or they could have been moving; but 
i then their centres should have moved along other circles, the final circle being 
| centred at the Earth, The astronomers of this phase aimed at the establishment 
of a complex system of circular motions that would agree with the observations, 
| The predictions based on such systems were. frequently very successful. 

After the works of Copernicus, abolishing the dogma that the centre of 
the last circle must be at the Earth, and after those of Kepler, dealing 
Similarly with the other dogma of circularity, the way for Newton's theory of 
gravitation was cleared and modern astronomy was begun, This is distinguished 
by the circumstance that, instead of making assumptions concerning what must be 
the nature of the observable movements, it formulates hypotheses concerning the 
machinery that may have produced these motions, The deductions from these 
hypothéses were’ then compared with observation, This is what I have called the 
| apriori method, You know that in the case of astronomy, this proved to be 
| much more efficient than the empirical method, I am induced to think that some- 
thing similar will prove to be true in economics, 
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Empirical statistical research in economics may be CREE AG hie: a 
recent work of E. C. Rhodes, * : 


.Various institutions in different countries are engaged in computing 
and in publishing indices. measuring. the extent of various economic ac tivities 
in terms of some conventional unit, usually representing the. same measure at 
some fixed moment. Apart from such indices, like those of building activity, © 
circulation of currency, electric pover, production of various commodities 

indices which I shall call specific-—many economists find it necessary to 
consider something that could be termed the index of general business activ— 
“Lty. The object of Rhodes! paper -was to supnly a new method of determining ‘ 
such an index of business activity and he used it to calculate the values of 
toat index for Great Britain. 


Rie may of reasoning was approximately as follows: Consider a number 
of specific. indices and their values as calculated for moments t , to, ..., by 
Let those specifie indices be X}, XQ, «ee, Xse> The changes in the values of 
those apna ‘S$ occurring in time depend babieeiny oS on a very great number of 


suppos ea: to influsnos all of the Fp nea he calis the par ye of usiness | 
activity, denoted by I. oti ’ er ; | 


He does not. specify any propertics.of this factor, that is to say he- - 
docs not define its nature. Nor docs he make any hypothesis concerning the 
machinery.of economic processss involving the influence of I on the Xj, but heq 
docs assume a certain particular form of dependence between the Xjand I. In 
other words, he makes the assumption that. whatever the machinery of economic 
processes may be, the final results of this machinery must express itself by . 
‘equations of a pecesealer kind connecting the X; with I, and also some other — 
ch nos eas and equally Wares anes factors, | 








You will have no aifficulty in recognizing nere the same vay of approach- 
_ing the problem as we nave seen in the. first phase of the history of astronomy. 


The method of Dr.. Rhodes is, I dare say, familiar; and vrobably many of 
you would easily guess the nature of his equations connecting the X; with the 
mysterious factors. ‘They are linear equations with some unknown constant co-— 
efficients. Denoting severally the unkno™m economic factors influencing the 
Xi by & with subscripts, ve may write the equations assumed by Dr. Rhodes in ~ 
the form 


Aa Sag * babi + biske wey Hates Bi for “Be 1a hy wee Ss 


. The values of I and the F's are changing ssdeys time, and it is assumed 
that they do so indenendently of each other; and the values of each of the X;_ 
are supposed to change in accordance with the above equation. In the analogy — 
with astronomy, this assumption corresponds to the motion of a celestial body, 
which motion may be split into a mumber of cyclical motions. But there, is Par 
another assumption that Dr. Rholes makes which could be. considered as roughly 
corresponding to that of the last of the Ptolemy! s circles centring at the. 
Cn ohn Hated ei dtca al WBF Gio Mah - — ee Tt PS tn hae 
* E. 0, Rhodes: "The constraction of an index of business activity", Journal 
Royal Statistical Society LOO, pn, 18-66, 1937. | 
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Earth; this is that making some linear combinations of the equations (1), 
we can manage to eliminate most of the —; and so obtain the so called sub- 
sidiary equations of the following form: 


Ay) % + Arp Xp + oye + Ags X= BT + 0,8) 
Ag, X + App Xp + .4. + Aog Xz = Bol + Co&o 
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The final formula for the calculation of approximate values of I, for 
each moment for which the values of the X; are available, represents a weighted 
average of the Xi, the weights depending in a certain manner on the coefficients 
of the subsidiary equations (2). Consequently the possibility of calculating 
the values of I depends on that of estimating the coefficients in (2). Dr. 
Rhodes gives a way of estimating them. Of this I shall notice only that it is 


-based solely on the valucs of variances and product moments of the Xj, That is 


to say: if we use the observed values of the X;, say 


ma VE xp te) ps yp xy Cn) age) MRT; pte (Fy 


where X;(t) denotcs the valuc of X; observed at the moment t, to calculate* 


X, == Pe eae a. OY) 
eee 2 (X(t) Xl? (5) 


pig == 2D Cay lt) - GIXs(t) - Fj] (6) 


then all the calculations suggested by Dr. Rhodes, verhaps in a different form, 
could be carried out to obtain estimates of the coefficients in he), these esti- 
mates depending only on the values of the sj and pjj, 


The method of Dr. Rhodes has the advantage that it cannot fail to produce 
a result, that is to say, some result. Similarly, tne efforts of early astrono-— 
mers always produced a system of circles and velocities, bringing their tnoeory 


into fair agrecment with the observed motions of the stars and planets. And the 
_ objections to both methods are similar. They are summed up in the question of 


the reality of the objects with which the respective theories are dealing. Do 
the epicycles actually exist? Is there any reality corresponding to Dr. Rhodes! 
index of business activity and otnuer factors? ; 


* Unless otherwise indicated, the summation } will be taken over t running 
from 1 ton. 
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"Reality" is a dangerous conception. Do atoms represent a reality? 
This may be questioned. Bat whether they are or not, the chemist treats them 
as such; in other words, he postulates their reality. Therefore, I shall 
formulate the tvo above questions a little differently: is there any advanta 
in postulating the reality of the epicycle’s in one case and of the various 
"economic factors"+-such as, for example, the general business activity—-in 
the other? ka ae 


We are driven here to a more general question, what advantages we 
might expect at all from postulating the reality of this or that conception? 
But this question does not seem to be very difficult. Its ansver is readily ~ 
obtained when ~e think of conceptions that are strongly established in science 
and in the reality of which ve have forgotten to doubt. The advantage of 
postulating the reality of atoms in general and of the atom of hydrogen in 
particular consists in the fact that their definition--in the textbooks of 
chemistry treated as discovered properties—~permits us (i) to identify the 
particuler atoms in various conditions; and (ii) to predict the results of 
various experiments. It was possible to bring the definition to such a re- 
markable concordance with the facts of observation that various complicated 
checks and rechecks are matters of history. : 











Te may start with a quantity of matter, conforming with the definition | 
of hydrogen, bring it into contact with oxygen, and produce a quite new kind © 
of matter, water. This again could be transformed into a number of other sub— 
stances. ‘Ls then the atoms of hydrogen do not lose their identity, they can — 
be obtained again in their vrevious state, and, apart from an experimental err¢ 
in the previous quantity. 


Similarly there is an advantage of postulating the reality of forces in 
general and that of gravitation in particular. Here again we have the possi- 
bility of identification of generality and of relatively easy predictions in 
an enormous number of casés, including movements of planets. 


If, armed with such observations, we turn to. the problem treated by Dr. ~ 
Rhodes, or indeed to any other problem treated empirically, we shall see that 
there are no similar advantages in postulating the realities that are usually 
postulated in empirical methods. If there are any advantages at all, they 
must be different. But I cannot perceive them. 


If we do not build any system of hypotheses concerning the machinery of 
processes under discussion, it is eo ipso difficult to give definitions to the 
assumed realities that will permit one to distinguish them from others. It 
follows that, as a general rule, in any two given cases, there may te diffi- 
culties with identification of the same realities. And I cannot see any use 
in science of “realities“ that are impossible to identify. To illustrate my ; 
point I shall assume for a moment that the economic factors considered by Rhodi 
are real and that in some three different countries A, B, and C a certain set 
of three indices X,, Xo, and Xz, are connected with factors E— by the follow. 
ing equations: 
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M =-lbv3 +8, $v5 
Xp = 1(1/2 v3) + €)(3/2 v5) + Ey vez7T5 > (8) 
Xe = 1(2/ v3) + es branch + £3 (2/721) | 





Country C 
kt sik cwiby i . 
Xp = €) + Esl (9) 
Xz = epteGs [ 





You will see that the situation in the three countries is entirely 
different and that in country C there is no general factor I at all. .Yet, if 
all the factors, I and the&; ,are mutully independent and varying about zero 
with standard deviations equal to unity, then, as it is easy to calculate, 
the variances of all the X and also their product moments will be in all the 
three countries identical, namely 











BP Pie, Deqteuhrat trdt de teehee 0S, (10) 


pe hus, the method of Dr. Rhodes, if applica to data from any of the 
three countries, is bound to give the same results--that is, apart from some 
fluctuations due to random variation. Moreover, if all the variables, I and 
the et besides being independent, hapnen to be normally distributcd, then 
_ the distributions that the X; will follow in countries A, B, and CG, will be 
completely identical in all respects, and not only in their varianccs and 
product moments. It follows that in such a case, not only the method produced 
by Rhodes, but also any other imaginable method, would not enable one to dis- 
tinguish between the situations as described by (7), (8), and (9). Yeteothey 
are greatly different, and if the equations (7) and (8) were known, they would 
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'. yield different formulas by calculating I, namely 

& 

 - For country A: I =const, (X, + Xp + Xz) (11) 
For country Bj} I =const, (X, - Xp + 2Xg) (12) 


For country ©, similar calculations would be useless, since no common factor 
exists. However, Dr. Rhodes'method would provide means for its approximate 
calculation --the same formulas as for countries A and B! 


Seeing all this one is led to ask: Whatbis gained by the assumption of 
the reality of various undefined Factors, and what is gained by the methods of 
their calculation? 
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The situation would be quite different if the factors were aetined’am 

away permitting their identification and a more or less direct measurement md 
Even if the distributions of the X; in my three countries were identical, a 
there may have been a possibility of distinguishing between the prevail ieee 

situations on some other grounds. But then the method of approaching tap 
problem would not be what I call an empirical one. * f 


I have dwelt on the work of Dr. Rhodes partly because it is such a 
good example of a purely empirical mothod of approach and partly because it 
was published only recently.. I remember it in.detail. But it is one of many 
similar ones. If you consider the method:of the so-called confluence analysis, 
advanced by Ragnar Frisch, you will find that it is open to almost identical 
criticism. Quite similar objections apply also to most of the popular methods 
of dealing with time series. Ye usually make arbitrary assumptions concerning 
not the nature of the economic processes but the nature of the functions of 
time representing their results, We say, nomely, that the time series that 
we observe are sums of four independent components: the secular trend, the 
business cycle, the seasonal variation, and the rendom residual, Next ve 
postulate that certain algorithms are able to eliminate one or more of these 

components and so to estimate the others. We apply the algorithms and obtain 
some results, but I do not think that they ill help much in the process of 
building a new theoretical economics, in the sense of the words in which we - 

speak of theoretical astronomy. 


All that these results can give is comparable to what gave the works of 
the early astronomers a multifold but unsystematic knowledge of many unconne Gia 
facts. This, of. course, is very valuable. 


Let us now turn to the other kind of statistical research in economics. — 
I hesitated a little before the adjective statistical as used just now; per-— 
haps it would de more proper. to say mathematical, but I think that, if not the 
present, then the future will justify statistical. At present, however, there © 
is not much statistics in the work. It consists chiefly in putting down a few © 
hypotheses of the kind that we frequently see in analytical-mechanics, concern 
ing certain economic magnitudes and expressible by means of equations either in 
differentials or in finite differences, and in deducing consequences of the 
same. g 


A good example of this kind of research is provided in the first. part of 
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* I want to emphasize that the above criticism of the empirical method of ap- 


proach illustrated in Dr. Rhodes' treatment of the problem of the index of busi 
ness activity, does not imply in any way that, in my opinion, the calculation ~ 
of such an index is useless. As a conventional mcasure of changes occurring in 
@& number of particular economic activities it is probably useful. But to be 

useful, it must be clearly defined in one way or another, as for example, in th 
case of an index publishcd in the "Economist": a certain weighted average ofa 
number of specific indices with some fixed weights. What I object to is the 
postulation of some really existent undefined and unidentifiable factors, omong 
them business activity, the postulation of the form of the equations connes 
them with the observable indices, and the attempt to measure what is not acti 
and immossible to identify. 
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a paper by Ragnar Frisch,* in which the author formulates certain simple 
hypotheses about the exchange of products between the urban and rural popu- 
lations. The former is personified by a shoemaker, the latter by a farmer, 
Asa first stage in the study it is assumed that the amounts of money xt 

| and y+ spent during the t-th year by the shoemaker on farm products, and by 
the farmer on shoes, are proportional to their own sales during the preced- 
ing year, so that 


Xt = a¥t~a 
(13) 


where a and b are some positive constants, 


Frisch solves those equations and discusses all the possible conse- 
quences that they may involve. Later on he mentions that the original scheme 
of the exchange is probably too simple and he makes an additional hypothesis, 
expressible by saying that the purchases of each of the two parties are influ- 
enced by their mutual indebtedness, Denote by 

t-1 
Ge. = Y (yi - xi) (14) 
1=4 
_ the amount of money owed by the farmer to the shoemaker at the beginning of 
the t-th year. Then it is assumed that 


X4 = ayt-1 + ¢ Gta 
Ree bee ail 


These new equations are solved and the implications discussed. I am 
_ not an economist and it is difficult for me to judge how far those particular 
hypotheses are likely to cover the essential part of the process of exchange 
between the town and the village. I suspect that the situation is over- 
simplified, But, however that may be, it is my opinion that, if a satisfactory 
general economic theory will ever be built at all, then it will emerge out of 
such oversimplifying hypotheses. 





I must, however, mention a defect in the above treatment. This will 
be apparent if we consider the curves representing the year-to-year varia- 
tions of xt and y+ as determined either by (13) or by (15), or by amy other 
system of ordinary equations. Diagram I on the next page gives the graphs 
of xt and yt forming a solution of the following system: 





X4 = api Fees Pel 


yt = bpp X+o4 


mere Oe goke 1) ae eg, #11',05, and xo)" Yo = 1. I do not know whether the 
~~ system (16) has any advantage over (15) from the economic point of-view, but 

I prefer it because it cannot lead to negative values of xt and yt. Looking 
at the diagram (p.116) we see at once that it cannot represent the movements 

' .of any sort of living business, The curves are too stiff. The zigzags may 
jel perhaps be mistaken for what in ordinary time series is treated as "random re- 
i ~~ giduals," but only at first sight. They are distinctly too regular for acci- 
4 dental occurrence, 








j * Ragnar Frisch, "Circulation asad ach Econometriea 2, 258-356, 1934, 
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The coordinates of the points will be found on page 125.- 
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It is obvious that whatever system of ordinary equations in finite 
differences we take, the curves so determined will all be of the same kind-- 
they will be too regular to represent the movements in economic life. It 
follows that ordinary equations in finite differences, and all integral and 
differential equations, are not the proper tool for dealing with economic 
problems. We must invent something else. 


To describe what I think is the proper tool I have prepared diagram 
II (next page), representing the variations in x, and Yes the yearly pur- 
chases of the shoemaker and the farmer from each other. It is easily seen 
that the curves in this second diagram differ essentially from the curves 
in the former one (p.116) in that they are not "dead". If the former were 
shown to any statistician he would not hesitate to state that it could not 
represent any real economic process, whatever be its kind. This is not true 
with respect to curves in diagram II, as many actually observed time series 
look very similar: you may distinguish here something like an ascending trend, 
two cycles; and also distinctly random fluctuations. Therefore, if any 
mathematical theory could have produced those curves, then we should not reject 
it without further investigation. At least we should not reject it for the 
same reasons that cast doubt on whether the ordinary equations in finite dif- 
ferences, such as (15) or (16), could be used directly to represent the 
machinery of the real economic processes. 





As a matter of fact, the curves in diagram II did come from some 
mathematical scheme, and this differs but little from the system of equations 


(16). 


We may notice by just looking at them more carefully that the equa- 
tions (16) are not likely to represent any real process. Assuming even the 
very simplified situation where Ky and y, represent the yearly mutual pur- 
chases of only one shoemaker and only one farmer, respectively, we should 
hardly believe that they could adjust their purchases so as to satisfy equa- 
tions (16) exactly. It may be an intended policy of the farmer to purchase 
from the shoemaker exactly as many shoes (in monetary units) as he sells him 
food, but it is almost certain that in practice there will be deviations 
from this intended policy. The same is true with the shoemaker. Therefore, 
equations like (16) may be considered as expressing the general tendency of 
the parties; but if they are intended to represent what really happens in 
practice, they must be modified to include a random element, which is the 
necessary attribute of anything concerning living societies. The symbols 
that in these equations represent the constants --some of them at least -- 
must represent random variables. And this is exactly the way in which the 
curves of diagram II were obtained. It was in fact assumed there that 
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| (17) 

=. y, = (1 + B) xy 
Where a andg are independent variables, each follewing a normal law 
about zero with S.Ds. 0] = 0.2 and oo = O.l respectively. The values at the 
origins were assumed to be as formerly, x5 = yo = 1. The technique of obtain- 
ing the curves of diagram II consists in the following: We start by reading 
from tables of normal deviates * a number of figures that might have been the 
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* See the Seninote: on page 79 referring te the tables of Mahalanobis. 
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independently observed values of a Hotel variable; varying about zero with 
S.D. equal to unity. Dividing the first 31 of theese numbers by 5 (since 

= 0.2) and the following 31 by 10 (since #0 -l), we obtain what might 
have been the values of a and B respectively. Substituting those values’ in 
turn in (17), the consecutive values of x1, ¥43 Xo Vos sees X19 Vays were ° 


obtained, and those give the curves of diagram II, A curve of this kind was 
obtained by Ragnar Frisch (loc. cit., p.271). It corresponds to the system 
of equations (15) subjected to "erratic. shocks." These "shocks" certainly 
mean a random variation of some of the coefficients in the equations, but 
the details are not given by the author, 


It seems to me that the proper way of approaching economic .problems 
mathematically is by equations of the above type, in finite or infinitesimal 
differences, with coefficients that are not constants, but random variables; 
or what is called random or stochastic equations, They can embody the a priori 
hypotheses covering the machinery of the economic processes studied; that is 
to say, hypotheses like those of Ragnar Frisch just described; and then they 
will leave room for chance variation, which seems to be an essential feature 
of any real time series, 


It. must be noticed, however, that this tool is not yet quite ready for 
use in economic studies. The theory of random differential and other equations, 
and the theory of random curves, are just starting their existence, The num- 
ber of papers published on the subject is small.* 


We must understand what a solution of a random equation in finite dif- 
ferences may represent. For simplicity I shall consider only the case of two 
time series, xt and yt, as in the graphs just discussed, Obviously the solu- 
tion of a system of two random equations cannot give a unique system of values 
of x, and y; for each moment t, x, and y, being fixed; and in this way the 
random equations would differ from any ordinary ditrerente equations. Instead 
of a unique set of values for x, and yy we have a probability distribution of 
them, generally depending on the values X54 X), «++, X4.}; Vos Vy» sees Vtrys 
which the variables x; and yz had at previous moments, Having this probability 
distribution, say p(x, hy aa ects, Mtwly Vou cos Veupi, we could calculate 
the most probable values of x; and yy; and this, in certain cases, may be an 
interesting result. In fact, this would be the form of prediction of future 
values of the two variables, The product 


hi Dire. Ful Koy elon ai Noein iia ad (18) 
1st ond, 


would give the probability law of all the x; and y,, starting with the 


* The first contribution in which a particular problem was solved seems to 

be due to Harold Hotelling: "Differential equations subject to error, and popu- 
lation estimates," J.Amer, Stat. Assoc, 22, 283-314, 1927, The first author 

to discuss the general theory of random differential equations seems to be S, 
Bernstein, of which I shall cite his report read before the International 


Mathematical Congress in Zurich, 1932; and a paper by him entitled: "Principes 


ee de la théorie des @quations différentielles stochastiques" published in the 





Memoirs of Stekloff Institute, Leningrad Academy, Vol.V, pp.95-124, 1933. 
Finally I shall mention a paper on the theory of random curves by HE. Slutsky: 
"Qualche proposizione relativa alla teoria delle funzioni aleatoarie" Giornale 
dell' Istituto Italiano degle Attuari, Anno VIII, pp.183-199, 1937. 
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(totl) st pair and ending with the tethi bets aetna ete the we es with: rose 
to Xto*kKs Vtotk LOT aK PF thy eye Soe to wig: ve could obtain p(xt, ¥+ | Xe i 14 
mee Xt, Se Viyeees Ste), the probability lay of x, and y,, depending on the ‘ 
given known valucs of ra i af MO Xt, ,and Yo, Ya; «++, Yt, ™hich it would ; 
be possible to use for predicting the Values of the t370 variables corres; aaa 
to a moment t-to units of time ahead. of ty 


‘Finally, the above BOOS (18) cor srresnonding to tp = 0-would give the 
probability law of all the systems of the x; and yj, starting with the first ; 
and finishing with the t-th. This could be “used, for example, for cal ele 
the most probable shape of the ‘curves representing the two time series.. 


Our present hits whee permits us to perform all those operations foe 
particular problems, but the number of general results concerning them isnot 
large. = Besides, even the solution or particular problems presents great 
technical difficul ties.’ 


; All the above steps depend on the assumption that the nich naey of a 
given economic process is known to be expressed by a given system of random 
equations. In fact, they are examples of deductions that are to be made from 
such cquations. We must now face the problem of how to decide whether any 
given system of random equations does or does not represent the machinery of 
a& given process that is describéd by some time series, saab cit observed or 
tO re observed in the future. 


This problem forms a chanter, as yet. untouched, in the theory of 
testing statistical hypotheses. Here again we shall probably have to wait a 
considerable time till all the tools that are necessary for economic research — 
are ready. To explain the problems that I see in this section of the work, 
it will be useful to discuss a simple example. 


Suppose that ve are able to observe the figures X¢ and yz; measuring the 
mutual purchases of the urban and rural populations respectively, and that we 
wish to test the hypothesis, say Ho that the whole machinery of the economic 
process, could be described by the random equations** 


(a. + ay | 
(bo + B)xy. ‘| (19) 


wosee Ag and Do are some unknotm constants and @ and B are two random variables, 
known to be normally distributed about zero with wunknow 58.Ds. 6, and 09 rew 
spectively. The initial values of the two variables X) and yo will be assumed 
known, ‘4 
The above ie ark arc closcly connected with that of des so-called Markoff 
chains forming a special branch of the theory of probability. See’ V. Romanovsky 
"Re cheg ch eg chaines de Markoff", Acta Mathematica, t. 66 me e . 
1935. ela cage 4 Usbensky Mathematical’ Probability eC ees Hill, S87) B a 


eT i taniaees: understood ‘that these equations ar shesstih ities only i 
exemple and that it is not suggested that they, or any cther in being paper, mist 
represent the. actual economic peveoes under congideration.\ af) 
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It will be seen that the above hypothesis HH), or dation the state- 
ment of what is considered as knomm, is a little artificial. In practice 
we should probably hesitate to assume normality of the veriables a and f®, 
However, as I have said, the present state of the theory of testing statis-— 
tical hypothesis is only that in statu nascendi, and the above assumption 
is made in order to make the necessary cal culations easier for the deduction 
of the test. 


A test of the hypothesis H, will be conceived as a rule that is, ina 
sense, similer to the rules found for games of chance. We consider the situa- 
tion existing before the values of the x; and yy are actually observed, and 
we denote by W all the possible systems of ‘n.. pairs of such values, for 
t=1, 2, ..., n, that may be given by future observations. Denote generally 
by E any such system, so that W will be the set of all the HE. We agree that, 
as a result of the test, we shall take one of the. following steps regarding 
Ho: we shall either reject it or refrain from doing so (= "aecept"it, for 
short). The process of choosing a test for the hypothesis Hy becomes thus 
equivalent to the division of thé set W into two parts, say w.and W-w, with 
the intention of rejecting Ho whenever the observed values of the x and y 
fall within w, and of accepting H, when, they fall in W-w. The problem is 
thus reduced th that of choosing properly the set w, eiaieh is designated as 
the critical region of W. This may be attempted in various ways, but I shall 
describe briefly only one of them leading to the agra kind of test 


called unbiased of _type B ie® 


Not being able to say in advance whether the hypothesis tested Ho 
is right or not, we have to consider the consequences of any particular choice 


‘ of the critical set w, both when H, is true and when it is not. 


‘Assume first that Ho is true. Then we shall want the set w to be so 
chosen that the probability of the values of the x and y to be observed falling 
within w should be rather small, say P = 0.05, or P = 0.01, or the like. In 
general, there are considerable difficulties in finding the particular set 
satisfying this condition, since the hypothesis: tested H, may be true, and 
then the values of the unsvecified constants, ao, bo, 1, and Q% may vary withi: 
broad limits, and this will usually influence the probability of E- falling 
within this or that part woof W. The set w that is wanted is what is called 
"similar" to W with respect to a, bo, 9%, and O, and having its "size" equal 
to P, which means that it must possess the property that whatever be the values 
of the parameters 8), Do, 9%, and %, if Ho is true, the. probability of E 


falling within w has always the same value, namely P. 


In the particular case under consideration there is an infinity of sets 
w satisfying this condition and, so far as the situations where H, is true 
are concerned, any one of them will be equally good as a critical set. 


?- Next we must consider the case where Hj is wrong. Here we should like 


thé critical set to possess the property that the probability of EH falling 


within it should be as great as possible. This is a little too vague a state- 


ment, but we see at once that we are Yaougit to Sone ier probabilities of E 


anh J. Neyman: "Sur la vérification des hypothesis eons tite composees," 


Bull. Soc. Math, de France, t. 63, 1935. 
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falling within w corresponding to cases where H, is wrong. To be able to Rei 
calculate these. probabilities and to compare: their values, we must make some_ 
assumptions concerning the various ways in which the hypothesis H, may be oa 
wrong. In other words, to have the mathematical problem determinate, we a) 
must Per what is the set of admissible RIES POS Re that are contradictory 

to H, 


Alternatively, we may specify not the whole set of admissible hypoth- 
esis contradictory to Ho, but only some of them, with regard to which we 
desire our test to be particularly sensitive. This step.must be made, since 
otherwise the problem of choosing between all possible sets w, all of the 
same size P and all similar to W with respect to a), bas o, and 09, will not 
be determinate. 


Let us turn to the particular hypothesis H, considered and see how ee 
we could specify the set Q of admissible saa peat oe contradictory to H,. This, 
of course, could be done in various ways. First of all we could make a purely | 
negative statement: if H, is wrong, then the variables x4 and yt. do not sat- — 
isfy equations (19). In this ease the set Q would consist of all possible 
hypotheses describing the distribution of the x and y, with the exception of 
Ho. Starting with this statement, it would perhaps be possible to find an 
appropriate test, but attempting to make it sensitive to all possible devia- 
tions from the hypothesis Ho, we shall cause it to be not sensitive enough for 
some in particular of them that may be regarded as especially important. 


One of the possibilities in this respect is suggested by the work of 
Ragnar Frisch. In fact, he suggests that the machinery of the exchange between 
the shoemaker and the farmer should perhaps include an adjustment arising from 
the indebtedness of the parties. Of this, however, he is not quite sure and, 
when discussing examples referring to the equations (15), he finds that the most 
realistic of them is the one in which ¢c is positive and d equal to zero. In 
such a case, the indebtedness of the farmer would. influence the purchases of 
the shoemaker, but not those of his own, 


Treating the problem from this point of view, we could consider more 
diese iy the set of hypotheses, say Op, stating that the x, and yy satisfy 
equations of the form 


- | 1 a4 * (a + Q) Yt-1 (20) 


Yt 2 (b + B)x+_4 ° 


similar to Eqs.(19) but differing from them in having either a or b or both 
not absolute constants but dependent on G,_, (page 115) or, more, generally, 
on t, In particular, we may have in mind situations where a and b in Eqs. (20) 
are of the form ey: 
&= a, + a,t 

fe) 1 
i= bt A (21) 
@) and bo, a, and b, being constant. Let us then denote by Qp theset of 


“hypotheses ascribing to 4g, bo, a,,.and b, any real values, and let us consider 
' the tests that would be, in a sense, * most sensitive with’ respect: ‘to this 


: AY, 
we Ln oad ale 2 ie alae oti nt ieee pelle ane eae 


* See lecture III, pp. 47 and 48 in particular. — vate i , es 
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| particular set of alternatives. I emphasize the fact that we do not necessarily 
_ believe that all possible hypotheses are those included in Op, but we simply wish 
| our test to be particularly sensitive to cases where one of the hypotheses in 

Qf happens to be true, ; 


We may try to select tests of two hypotheses, H; and Ho, defined as 
follows: H, affirms that a, = 0 but does not specify what could be thé values of 
Ao, Do, and by. Ho affirms that b, = 0, but does not specify what could be the 
values of a, Do, and a,. These hypotheses are of quite similar character, and 
it will be sufficient to consider only one of them, e.g. Hy. 


In selecting a test for H,, we may try to select the criticai region w 
so that (i) whenever the hypothesis that a, = 0 is correct, then the probability 
of E falling within w will be P = 0.05 (say), and (ii) whenever a, deviates from 
zero, then the probability of E falling within w is increased to the greatest 
possible value, A test based on the critical region having these properties is 
unbiased of type B, and it will be most satisfactory for detecting cases in which 








ay 7? O, (The critical region just referred to is described on page 121), 


rerforming the easy calculations described in my paper referred to above 
(p.121), it may easily be shown that the test under consideration is reduced to 
the following steps: 


(a) Calculate the ratios qt = xt/yi-1 for t = 1,’ 2) ..., mm 


(b) Calculate the regression coefficient f of qy on t. 


i yaowe 
(c) Calculate Sp® = Saale? (22) 
(n-2) id 
for an estimate of the variance of the regression coefficient of q 
on t, where 6,* and 04% are the variances of the observed values 
of q and t respectively, and r the correlation between them-~all 
calculated from the sample, 








(d) Reject the hypothesis H, (that a, = 0) whenever |f|/sp > t' where 
t* is the particular value of t read from R, A. Fisher's tables for 
the chosen value of P(=0.05), and for degrees of freedom = n - 2, 


Of course similar steps should be taken if it were desired to test the 
_ hypothesis Ho that the coefficient b of Eqs.(20) does not depend on time, as 
is the situation if b, = 0, 





The above steps were performed on the coordinates x; and y; that were 
used in the two curves in diagram II, page 118. Ho was tested first. As was 
expected, the test did not detect any "evidence" that b, 7? QO and thus that b 
in Eqs.(20) is not a constant. On the other hand upon testing H, (the hy- 
pothesis that a, = 0) it was found that 

f = -0,0315 Se = 0.0047 


whereupon the application of rule (d) would lead to the rejection of the hy- 
pothesis that a, = 0 in Eqs.(21), If these calculations had been performed on 
some real data and not on an artificial example, then the next step in the re- 
search would have been with the economist, who would have to think of ways of 





























- 124 - 
of . 

altering ie Renan. tps Hy ‘SO as to bring ‘it into terter ints ee with the - 
observations, Perhaps he would think of Eas.(15) or (16). If appropriate. 
methods were at hand, then the hypothesis that he might gona erupt could. algo ma 
be tested, A number of such steps, with an increasing volume of observational — 
‘data, will, we hope, eventually. ive. an econometric theory worthy of the name, 
However, this is a question for the future. At present, we should concentrate. 
on oneperine the necessary tools, veers ' 


el 


; DR: LOUIS H. BHAN: One of tht problems that the analysis of time ie 
series runs into is the fact that. the successive data of x and y are themselves 
intercorrelated, that is, your successive values of x are correlated with ‘one | 
another and successive values of y are also. With that type: of problem oy 
is it possible to apply the usual correlation technique, which rests, I assume, 
on the assumption that vou have no dependence phone your ono yeh Bre ina 


7 
given series? , 


if 
% 


DR. NEYMAN: I think that if this question is teen in its full 
generality, then the answer is in the negative. But there are cases where the 
correct solution of the problem is reducible to the ordinary correlation 
analysis aoplied in some particular form. To illustrate this paint ve may use — 
the examole of the time series of diagram II, page 118, just discussed. It is — 
obvious that any two consecutive valucs of x are in a sense correlated, their 
most probable variations being as in diagram I, page 11Q However, as I have © 
pointed out — and this may be SS fH by dctailed gal culate ibe oo Ee test of © 
the hypothesis that a, = 0 is reduced to what is ess entially the ordinary cor- — 
relation analysis of q and t. But this circumstance is closely connected with © 
the particular hypothesis H, we desired. to test and with the set of alternative 
hypotheses with respect to which we desired our test to be particularly sensi- ~ 
‘ tive. I moy confess. here that in constructing this example I made some efforts 
to formulste both the hypothesis testcd and the circumstances considcred as 7 
known, so as to be able to arrive at ‘the final result. It hapvened to be as I 
have describcd., But in the process of constructing the example I have consid= ~ 
ered a few other hypotheses to be tested and the corresponding sets.of alter-— 
natives thet suggested themselves as reasonable. I.am sorry to say that in ae 
none of these prospective exammles I as able to complete the calculations ‘lead- 
ing to the wmbiased test of type B; but it was clear that these tests would not 
‘be covered by what is, now cabled the correlation analysis. ‘.. 

‘ é oe 

My opinion is that in the question you. ask no wholesale rule could berg 

given end that cach problem should be (i) properly stated:: What is the nypatlee 

esis to be tested; what is considered as kmomm, and what.are the. alternative 
hypotheses; (ii) next, each such problem should be considered and analyzed by 

itself. Later on, ve shall certainly arrive at-a classification of prope 

into various types, but vis will not be a today. or TS oe 


oe TBS ie 


Coordinates for plotting the 


diagrams on pages 116 and 118. 


Solution of the equa- Solution of the equa- 


tions for diagram I tions for diagram II 
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12 22.452 26.626 12 18.498 203354 
13 36,744 22.452 13 31.325 16,926 
14 26.673 36.744 1 °28;706 25.561 
i 15 42:;660 26.673 15 41:639 25,864 
i 16 28.673 42:;660 16 23.614 34,810 
: 17 48.163 28.673 Dries 0eP 35.125 
bd 18 31.884 48.163 18 25;753 29,395 
a 19 54,617 31.884 19 40.830 29,384 
I 20 34.339 54.617 20°°43:136 37,645 
i 21 57.839 34,339 21 43.668 41,324 
i 22 36.125 57.839 22 60:002 50;306 
vi 25 69.746 36,125 23 53.878 62.882 
: 24 37.967 59.748 24 71:308 50,484 
_ 25 60.823 37.967 25 56.189 70.452 
26 39.372 60.823 26 65.098 56.077 
27° 62:5203 39:72 7) 49:256. 522860 
28 40.474 62.283 28 52.173 47:759 
Yi 29 63.716 40.474 29 36:536 51,860 
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STATISTICAL ESTIMATION, 
Practical Problems and Various Ae enpts to 
Formulate their Mathematical. Equivalents 


A conference held in the auditorium of the Department of Agriculture, 
8th April 1937, 10 a.m., Mr. Alexander Sturges, presiding. 


I hope that this conference will be easier than the others because 
it will be mathematical, Some people think that mathematics is difficult, 
but whoever is acquainted with mathematics knows that it is easier than 
anything else because it deals with clearly defined ideas, rules, and 
hypotheses, that are not obscured by circumstances in life that are very, 
very complicated; certainly thé most complicated thing in the world is 
life and the world outside of us, 


But the problem that I am going to be. concerned with this morning 
is connected with life; and in fact, any mathematical problem could be 
traced to some practical problem, as in geometry, engineering, surveying, 
physics, and so on. When I shall speak about the problems of estimation, 
I shall have two aspects in mind; one is the practical problem of esti- 
mation, and the other a Ow eeation) model of it, What is the practical 
LIE Sep estimation? The statistician cannot study the whole popu-— 
lation in which he is interested, This population may be, for instance, 
the population of farms in the United States, If for some reason it is 
inexpedient to study all the farms, the only thing we can do is to draw 
a random sample out of this population, and try to judge from it what 
are the preperties of the population. This is a practical problem of 
estimation, and no question of probability is involved, nor any of mathe- 
matics, As a matter of fact it is obviously hopeless to try to get 
from the sample exact properties of the population, What we can hope 
for are some figures which "presumably" are not very wrong if treated 
as characters of the population, 


"Presumably" is a term referring to our state of mind, but. it is 
not a scientific term, In order to treat the problem mathematically, 
the mathematician must translate the requirements of practical statistics 
into his language. He must substitute something definite for the general 
idea concerning estimation, The first thing that he must do is to put 
the problem of practical estimation into mathematical form, In doing 
so he will have to deal with probability, because probability is the 
only mathematical concept that has something to da with the vague idea 
of "presumably." If we analyze the situation we shall find that the 
probability is counted not only at the end, but at the beginning of the 
problem of estimation. A sample from which the population is to be 
studied must be "properly" drawn; otherwise the theory of probability 
“is not applicable. Also, we must understand the sense in which proba- 
bility statements are to be interpreted, If we want a mathematical 
picture of the problem, we shall probably say that the statistician is 


~ ment of measurement thet induces us to believe that the measurements 


ee « 


able to obtain some “numbers xg, X», and so on, these being measured 
values in a random sample. We must know something definite about the 
method of drawing the sample; we know:also something about the popu- 
lation-~perhaps very little; and we i poy to know something more about 
Tt. 


The knowledge available frequently determines the general form 
of the integral probability law* of the xj. However, owing to the fact 
that our knowledge about the population is not compicte, our knowledge 
of the integral probability law could not be complete oftier | In 
frequent cases--and only these Will be considered below--the elementary 
probability law* of the x; exists, its form is known; and the only 
things about which we are doubtful are the values of several paraneters 
entering the elementary probability law. 


It will be sufficient to consider the case when’ there are only 
two unknown parameters, which we shall denote by 6, and 69, The case 
when: the number of unknown parameters is greater is completely analo- 
gous, In‘order not to forget about the parameters that are unknown, 
we shall use for’ the elementary probability law the notation p(B] 6, G5), 
which is to be the symbol for the elementary probability law of the 
sample point* = E, calculated for the particular values 6) and @o of the 
‘parameters, The problem of estimation is how to use the observed values 
of the x; in order to estimate one or both of the unknown parameters, 


~ The situation may be exemplified by the properties of an instru- 


X4, X2, «e+, Xp follow a normal law in which, however, the mean and the 
standard deviation are unknown, © 


Wheat I have just said seems to be a common element in mathematical 
models of the problem of estimation as it is treated by various authors, 
In the following I shall give a short review of various ways of com- 
pleting the model and thus of formulating the mathematical problem in 
ite final dress. 


The first attempt at solving the question of how to arrive at 
"presumable" values of @, and ¢p5 was based on a theorem of Thomas Bayes, 
published posthumously in 1763.** This applies to the situation when 
not only the x; are random variables, but 6; and 6 are also; that is» 
to say, it applies when we sample not only the values of the x; but also 
the populations, We must imagine that we have a set of different popu- 
lations, and out of them we pick one at random this time, another or 
perhaps the same one tomorrow, and so on, The formula of Bayes is 
familiar and I shall not go into it. I shall merely point out that it 


= See eS Sf OP ee Pe ewe ee ee Se KB SB eee we ell ee le eS Se eS Se SS SS 


* For definitions of probability laws, etc., see the first two lectures, 
page 16 in particular, . 


** Thomas Bayes, Phil. Trans, pone Soe, London 53, 370-418, 1763, 
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applies only when the parameters themselves are random variables, and 
moreover, only when their probability distribution is known, When this 
is 86, anit we have also the observed values of the X45 forming a random 
painp lb from a population picked at random, we may apply the formula of 
Bayes either to calculate the posterior probability that 6, and Gp» have 
values within any assigned limits, or we may calculate the values of 

0, and G5 that are the most probable and consider them as estimates, 

The mathematical model of the practical problem of estimation connected 
with the Bayes' theorem is perfect by itself; however, theré are serious 
difficulties with its application, which I shall now mention, 


In most practical cases the parameters are not random variables, 

We don't sample them, Consider fox: instance, the population of persons 
who were living in Washington at a certain eta ne of the year: 1935. 

This is perfectly constant and’ its properties are not subject to:any 
random variation, On the other hand, if we sample randomly from this 
population and the x; denote some eine of the individuals drawn, 
then theré will be no difficulty in considering the X4 as random vAupabion. 
In fact, we are usually. taking elaborate precautions in the method of 
sampling to make sure that the Xs do possess the properties of random 
variables according to some probability law. . Nothing of this sort could 
be done with the properties of the population just mentioned and they 
must. accordingly be considered as constants, Therefore, in this and in 
“Many other problems, the formula of Bayes is not applicable because it 
‘refers to a different situation, that in which the parameters themselves 
.are randon variables, Sometimes, of course, the parameters may actually 
‘be random variables, but the application of the formula of Bayes requires 
not only this, but also knowledge of the: probability law of the param- 
eters, since the probability law of the parameters comes‘into the 
formula and must be known, 


Different scientists have advanced different ideas of how the 
difficulty could be overcome, This was in the early days of the theory 
of probability when there was a confusion between the two different 

“elements that I spoke of yesterday evening, namely, the mathematical 
conception of probability and our psychological idea of probability.* 
They were confused very much, and a principle was’ advanced called the 

principle of insufficient reason,. This principle says that whenever we 
oe . do not know anything about the value of “1, we are allowed to assume 

| = that the probability of 0] lying within any interval is simply propor- 

; tional to the length of that interval, The probability law of oy 

would then be represented by a rectangle, In other words, if we have 

‘no sufficient reason to assume that some particular value of a7 is more 

~’ probable than any other, then we may assume that the probability distri- 

bution of. 6, is constant. On this principle, there will be no diffi- 
culty in using the formula of Bayes to calculate the most probable 
values of Oy and. G5, or the probability that they fall within any given 
ranges, However, it is important to be clear about the range of 
validity of Paget calculations; they can be no better than the assumptions 








"Cf. page 32, 


eS _ 


iBone 


put into the formula, 


My second lecture was concerned with the empirical law of big 
numbers, You remember its general meaning: if we start with the 
correct mathematical model of a set of random experiments, such for . 
example as sampling the population of Washington, and deal with 
probabilities as I have defined them, then whenever we find by calcu- 
lation that the probability of some specified event is equal to a, 
the frequency of that event in repeated experiments will actually 
approach a, In other words, if we consider only probabilities of the 
kind I have described, the empirical law of big numbers permits the 
prediction of frequencies of future events, the probabilities of which 
we are able to calculate, It all depends upon having the proper 
mathematical model, or alternatively, arranging the experiments to fit 
the model. ‘ 


When judging the principle of insufficient reason, it is important 
to remember that the assumed constant probability of @, falling within 
any interval of a fixed size is not a probability of the kind related 
to the empirical law of big numbers, The former is a probability de- 
scribing, as somebody has said, the state of our mind; and usually it 
has no relation whatever to the frequencies of 9, having this or that 
value, The consequence of this is that the most probable value of @,, 
calculated from the Bayes' theorem with the application of the principle 
of insufficient reason, may be described as the one that we are ready 
to believe in the strongest, but in general it will not be the one 
that we shall most frequently mset in practice; Our sampling technique . 
in obtaining the xj may be perfect, and the law of big numbers in every- 
thing concerning the x; may work well,* and then, having obtained from 
Bayes' formula and the principle of insufficient reason that the proba- 
bility of 1< 6,< 2 is 0.999, we may be disappointed to find that the 
frequency of cases in which 6, does fall within these limits is negli- 
gible. 


All this justifies further attempts to get something better than 
the principle of insufficient reason. 


There were other principles advanced for overcoming the difficulty 
with Bayes' theorem, Of those, I shall’mention two very briefly, One 
was advanced by Gauss** but not in a very clear way. It was developed 
and put in practical form by a famous Russian mathematician, Markoff, 
whose book is. now translated‘and will appear soon in the Annals of 
+ That is, the correspondence between relative frequencies actually 
obtained in practice, and the probabilities calculated from a given 
mathematical model on the basis of a certain vaiue of #;, may be very 


good (see Lecture I, page 18a, and Lecture II, page ce 


*K See e.g., Gauss, Theoria combinationis observationum erroribus 
‘minimis obnoxiae, pars prior, page 49 (Gottingen, 1821); also + 
‘J. F, Encke, Jahrbuch fur 1934 (Berlin, 1832) pp. 284-285, 
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Mathematical Statistices.* Markoff Was not a statistician, he was a 
mathematician, What I shall say will be exactly equivalent to what 
he says but it will have a form more familiar to statisticians. 


Suppose we are interested in the value of the parameter 6. At our 


disposal we have the values af x}, xo, etc,, and they are random . 

variables. You know the conception of mathematical expectation; I | 
shall denote the mathematical expectation of afiy:variable u by E(u), 
Now let us consider some function F of xy eras ete, Any estimate of 


'@ willbe a function of them, I. shall say that the function F is an 


unbiased estimate of @ if the expectation of: F is identically equal 
to 6, i.e. if E(F)= @, That is to say, if whatever. be the properties 
of the sampled population, and if whatever be the probability law of 
Fis ts esey on, wne expectation of Fis equal-to,@, then this F will 
be called an unbiased estimate of 9, made from x], x2,.:., Xn. 


Now, I shall define what I shall call:a best unbiased estimate, 


“There are many functions whose expectation is identically equal to 6, 


Therefore, we are allowed to choose between several unbiased estimates. 
If we are’ to choose with a purpose, we must agree on what is the best 
quality, A good quality of F to consider is its standard error, What 
is the standard error of F? I shall define not the standard error, but 
the standard error squared, or the 


Variance of F = E(F - 6)? ’ ) ae (1) 
and I shall say that whenever the variance of F is diminished, the 
estimate F is improved. Then the best estimate,: the best briana 


estimate will be that one of minimum variance. 


Markoff has shown how we can calculate in various cases the best 
of the unbiased estimates which are linear functions of the X43 Suppose, 


_then, that-we cén find the function F for the bést unbiased estimate; 
‘having gotten a sample we may substitute the values of x}, Xe, ... 


into the function F, and we may risk saying that the res sult we obtain 
14 equal: to @, or perhaps not very much different from a, 


This statement has a justification provided by the theorem of 
Bienayme - Tchebycheff to the. effect that the probability (in the sense 


of the theory of probability I am using) that the value of F will differ 


from its expectation @ by more than t times the standard error of F is 
smaller than tn whatever may be t > 1. In various cases this proba- 
ended may be shown to be much smaller than the ddmityt-*, 


* The editor understands from pie ales Frank M. Weida; who is-editing 
the translation of Markoff's book, that it. will appear in supplements 

to the Annals of Mathematical Statistics, 50 to 75 pages. in gach of 
four or five of the quarterly issues, beginttive perhaps in December 1937, 


Later the book ab ee be. assembled and printed as a a cs volume, 
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You will notice, however, that in spite of this justification, 
the use of the unbiased estimates’ is based on a principle and that it 
is not a solution of any properly stated mathematical problem, One 

might ask for example whether there are not some other funetionsof the 
xj, different from the unbiased estimates, the values of whieh differ 
by a given amount from the estimated parameter still less frequently 
_than those of the best unbiased estimate, But the dogmatie character 
- of the unbiased estimates ‘will be best seen from a comparison with 
other estimates based on a competitive principle, such as the principle 
of likelihood, which is again Something artificially brought — in and not 
directly inherent in the theory of probability. 


QUESTION: The term erect eile estimate ‘refers to the particular 
definition you'have used. Does'the term best unbiased | iecaee gt aL aege 
refer to'the*two'definitions that you gave?. 


‘DR. NEYMAN:.. Yes.* The best unbiased estimate is such a function 
of the sample that bea expectation is equal to 6.identically, and the 
variance of the function is sainmpencsieg than that: of any other unbiased 

estimate, 4 


QUESTION: Can you use "unbiased estimate" basing it on some 
different definition not necessarily involving variance, or does best 
unbiased estimate always imply that particular definition? 


DR. NEYMAN: In the past, the words "best. unbiased estimate" have 
been defined in this way, but somebody. may use them to describe a differ- 
ent conception. There is no difficulty about. this. It is rather un- 
fortunate that in the theory of probability and statistics we frequently 
use very suggestive terms. "Best unbiased estimate"--if people are not 
- very sophisticated they will think: that it is the best; but it is only 
called the best and somebody else may call somethi ing else the best, 

We must clearly distinguish peaween what is, the best and whet is called 
the best, 


I shall describe now another principle, invented to solve the 
problem of estimation. So far as I know, it was invented in 1895 by the 
late leader of mathematical statistics, Karl Pearson;** he said that if 
the probability law of the sample depends on 6) (it might depend on 
other parameters too, but it doesn't matter--we are interested in 
estimating wvhis one), then the optimum estimate of a, which we may 
denote _by a1, is the value of 6, for which the probabi lity p aC aa G5) 


* Ro oe Pitman, in a recent article.in the Proc, Cambridge Phil. 

Soc, 33, 212-222, April 1937 uses biased in a different sense; he says 

an aytinate is bie ised if it.is more frequently too large (small) than 

too small (large). Editor. i 

** Karl Pearson; “Regression, heredity and panmixia." Phil. Trans. 

Royal Soc. 187A, 253-318, 1895; p.265 in particular, The method of 
maximum likelihood seems to have been first used by Helmert, Astronomische 
Nachrichten 88, No. 2096, 1876, Editor, 


a 


of “the! observed ‘values X}, xe pil. gyWxgshiis gvestar than for any 
ee value” of or. he phi e 


‘In-other words, the ntimetidal value of the probability p (n| 6, G5) 
will seid on 6), i ii} p(E| é@ j 85) is a function of 6). Now according 
to the principle just ay from among the possible values of .03.: 1 
shall pick the one (8) that gives the greatest probability, to the sample 
point E actually obtained, This is approximately the wording of Karl .- 
Pearson. He used the principle to obtain the now familiar formula for 
calculating the torrelation coefficient, the sum of products divided by 
the square root of the product of the gate of squares. He said “that 

this is the thing to do because if we: assumé! ‘that the ‘population corre 
lation re aaa p is equal to that in the sample,’ then the proba-~ 
bility p(E E| p) of the sample will- be greater than for any other value of 
p. However, Pearson did not insist upon this principle; he did not 

apply it in many other cases .in which. he-was faced with the problem of 
estimation, This has been done with great emphasis by R. A. Fisher. 

He calls the expression p(E|6) the-likelihood of @ for the observed 
‘sample E’= x), 5 ee Xn, and: he: considers its value the measure of our 
confidence in the particular value of 6 used in the expression, The 
value of @ that maximizes the. likelihood he -calls-either the optimum 
estimate or the maximum likelihood estimate and says this is the value 

in which we should have the greatest confidence, - | 


Again, this is.a new principle, saying that if you. want. es have 
the "best estimate, " just. calculate the value of @ that maximizes the. 
likelihood, But ‘you will have to believe. that it is the best, This is 
an arbitrary principle similar to the principle of insufficient reason 
and similar to the principle advanced by Markoff on unbiased estimates, 
It is-a principle that we may accept or reject just because we like it 
or are inclined to disbelieve it, However, there’ are no special reasons 
for believing that it is better than eae else; primarily because 
we did not formulate in advance what quality ofan. estimate we agree to 
consider to ‘be of the greates 3t: importance. 


As a ee of fact, it. has Been mtekabie ea prove various. 
properties of the maximum likelihood estimates {m.1, estimates, for. short) 
that provide certain justification for their use, The justification 
that’ Seems to be the main one is of the same character as the one that 
I quoted in favor of the best unbiased estimates. Its general effect 
is that, under certain limiting conditions, when all the observations 
are mutually independent and their number n indefinitely increases,,. . 
then it becomes less and less probable that the m.1. estimate will 
differ by so much from the parameter that is. being estimated, 


It is interesting to consider the relationship of those two 
principles, that of Markoff of unbiased estimates, and the principle 
of likelihood, What is their relationship? Do they give the same 
results or’ do they give different ones? Sometimes the maximum dikelic 
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hood estimate is identical with the, best unbiased estimate, then there 
is no competition between the two principles and everything is all 
right. Sometimes we are able to find only one of the two possible 
estimates, either the maximum likelihood estimate or the best unbiased 
estimate, simply because the equations that we need to solve to get 
both are too difficult and we are able to solve for only one. Here 
again there is no question of competition, But saqnetimes when we are 

. able to determine both the maximum likelihood. estimate and the best 
unbiased estimate, the two do not agree, and then we are in doubt which 
to use, and it may be difficult to choose between. them. 


I recall an example that is probably familiar, but in which 
perhaps the question of .@ieagreement of the principles is not known to 
all. Suppose that: each' of tne x; follows. the same normal probability 
law; then the ce wer hides d probability law of: each. of the. X; is 


plclixs 0} = (o v2m)~t oxol | x - U) )#/207] (2) 


where both o and u aré SRV in Le the x. are independent of one another, 
then the ‘i tea NL PERE 0S Law of all, of tiem is*, 


plElu,d) = (02m) expl- & ( at (3) 
depending on the bw La ae o and He 


. Suppose that the xy have been fixed: ws observation, and we wish 
to find a maximum like3ihood estimate of 6. Differentiating (3) with 
regard to o and equating the derivative to zero,.we get 


o2 =i» (x, - u)? ikea a 


if, as-is assumed, UW is unknown, then:this equation does not 
provide us with an estimate of o. The:-value of o that would maximize 
the likelihood whatever be u does not exist. -The difficulty seems to 
be overcome if we use the principle of likelihood for a simultaneous 
estimation of both o andt. Thus we may Look for.a.system of values 
of both o and | which would ascribe. to the likelihood a value exceeding 
all others. As’ a matter of fact, this is the method by which the 
estimate of correlation was found, Differentiating the likelihood with 
respect to and equating the derivative to zero, we obtain the equation 
UW = Ee X, =X. : (5) 
as Oe ee : 
This equation simultaneous with (4) gives us the maximum likelihood 
estimates of both u and o*, the latter being 


Ml 
n 
b 
—_ 
[ax 
~~ 


Die is (x. late? on 
n 4 


* Throughout this conference the subscript i in the summations will 
be understood to run from 1 to n, i.e., over the whole sample, 


1 « 


But as is well known, the best unbiased estimate of o2 is (say) 
ste = E (xy ~ 52/(n-1) uh ets RMA Wg 


Thus the two estimates are not equal; maximum likelihood gives leer 
while the "best unbiased" estimate is s'® = ns®/(n- 1) 


The difference is small, but it exists; and which of the two 
estimates is a question that has no meaning. It is remarkable that. some 
people who say they believe strongly in maximum likelihood estimates, 
in actual practice use the.best unbiased estimates of o®, It is de- 
‘plorable that it is not understood that the question whether maximum 
likelihood or the best unbiased estimate should be used is one of taste 
only. If this were clearly understood then there would be no room for 
unnecessary polemics. The. people would sce that the choice between s* 
and’ s'® is more or less. thet between the scents of Coty and Houbigant, 
or between wines from Bordeaux and Bourgogne, or between "Scotch" and 
wisi ah? 


~DR. DEMING: There is nothing unique about a best unbiased 
estimate, An ‘unbiased estimate of 6* sives a biased d estimate of o. or 
of o5, or any function of o other “heh o®,. Likewise, an unbiased estimate 
of o gives a biased estimate of me ~etoy , 


DR. NEYMAN: _ are perfectly right and this is just one of the 
arguments advanced in favor of the maximum likélihood estimates, These 
have the property that if. T-is a-maximum likelihood estimate of @, then 
T2 will be one of 6% , ete, There are many such properties of the two 
kinds of estimates whieh are sometimes considered as "proving" that one 
or another is better, I. doubt, ‘however, whether they are very persuasive, 
With respect to what you have mentione d one could ask for example, why 
should we require that.the estimate of o should be tie ‘Square root of. 
that of 68°F" Tf ‘our’. purpose is to-estimate o® and we like unbiased - 
estimates, then we should use s'*, On the other hand, if we are 
interested in o, we should use 


ign 2g (dn) PR HT f1)./ Mh n) = sv(1/2m)'B(4 ne1, 2) .. (8) 
which will be an unbiased estimate of o, 


It may be mentioned also that while s* is a maximum likelihood 
estimate of o* only if the variables considered are normally distributed, 
s’® has the property of being an unbiased estimate of o*, whatever be 
the distribution of the x,, just,so'o*<.o, This is an argument. in 
favor of st? . As I have said, there are many important properties that 
are frequently quoted to dicot one or the ‘other of the two principles 
--so many that there is hardly time. enough to enumerate them all. 


MR. FRIEDMAN: It is true;is-it not, that-if you'take the distri- 
bution of the sum of squares ns*, and get the maximum likelihood estimate 
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of o from that, you will get n - 1 in the denominator?* 


DR. NEYMAN: Yes, it is true; but if you imply that the maximum 
likelihood estimate of go is what I have denoted by s', and not s, : 
because of the cifcumstance you mention, then I should disagree. (Or 
rather I would say that the definition of a m.1l, estimate, as you seem 
to interpret it, is not sufficiently categoric to apply to one unique 
statistic at a time, Taking the distribution of ns* and maximizing it 
to get the m.l. estimate of o implies that this must be a function of 
the sum of squares } (x; - x)2 , If you do this, I would ask why you 
do not start with the ‘distribution of the mean deviation, say M = 
(t/a) S| xi .- xi or.of that of the range R = x, + Xg, Xi and xX, being the 
smallest and the largest of the n observations. In each case you will 
be able to find a maximum likelihood estimate of o, but one of them 
will be a function of M, and the other a function of R. They would 
be different, and both would differ from s', and the question would 
arise which to choose. The choice would require some new principle in 
addition to that of maximizing the likelihood, j 


My impression is that the originator’ (footnote page 132) of 
m.l. estimates had in mind teking the original elementary probability 
law of the x;, in a form like Eq.(3), and finding the values of the 
parameters which, for a given sample point £, would maximize it, 


MR. FRIEDMAN: If you more or less make "double" application of 
maximum Likelihood whenever vou don't come out with the best unbiased 
estimate, later sometimes you do. First, you get s®* (see Eq. 6); then 
if you apply maximum likelihood to the distribution of.a*, you will 
get ns®/(n - 1), am unbiased estiriate* of o%, If you now take this 
estimate and get its distribution and .apply maximum likelihood again, 
you come .back to the same unbiased estimate of o*, 


DR. NEYMAN: My point is that if you start with your original 
distribution of the x;, and if you maximize the probability so as to 
obtain the original maximum likelihood estimate, this estimate has not 
necessarily the properties of the unbiased estimate. By your device, 
you may obtain something that will have the property of being an un- 
“biased estimate, but then it will not have the property of maximizing ~ 
the likelihood, — | 


In order to illustrate the dogmatic character of the two 
_. principles of unbiased and the m,1. estimates, I will give one more 
example, 


Consider a case where it is known that all the x; that may be 
given by observation are mutually independent, and that each of them 
follows the same probability law 


This was Helmert's way of arriving at s'2 = ns®@/(n - 1) for an 
estimate of o* in 1876; see reference cited on page 142, Editor. 


ait Oo) tas 


p(x|6) = 67! foro<x< 6 
. , (9) 


p(x| 4) = 0 for any other. value of x, 


This is what is called a rectangular distribution with an unknown 
range 0, starting with zero... It is desired to estimate @, 


The maximum likelihood estimate of @ is the greatest of the x; 
observed in a sample, and I shall denote this estimate by g. The 
elementary probability law:of g is easily found to be 


p( | 4) = g7Mygn-1 for 0< gree (10) 
and zero elsewhere, where n denotes the size of the sample, as before, 
The mathematical expectation of gis 


B(g) = ne(n +1) tee | (12) 


so that g is not an unbiased estimate of @. An unbiased estimate, say 
81, will be provided by . 


g, = (n+ 2) g nt (12) 
Which of the two should we use? Consider more closely the particular 
Case where n = 3, so that the elementary probability law of g is 
represented by a parabola. To take g as an estimate of 6 is to assume 


that the observed g is exactly equal to 6, 


p(el a) | 





Judging from the’ graph this, of course, could be roughly described 
as "the most frequent" value of g, but it is most certain that in any 
practical case g will be smaller than 6. It is certain also that g 


‘cannot exceed 6, so that using g as an estimate, we are bound to make 


errors always of the same sign. This may induce us to use g, as an 
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estimate rather than g. Doing so we shall make errors sometimes 
positive and sometimes negative, and the average of these errors 

would tend to become zero. But, on the other hand, to take g) as an 
estimate is to presume that the value of g we observe is equal to 

& = 26 (see Eq.1l). If we look at the diagram representing p(glé), 

we shall see that it is so much more probable that g will fall between 
g and g + 76 than between g and g - $6. This may be considered 
strongly. against 2). tls 


Shall we use as an estimate something half way between g and 
@)? But here still another consideration comes in, The probability 
of g exceeding g isl - (2) = 0.578, and therefore the probability of 
its falling short of g is 0.422, Wouldn't it be better to assume that 
the value of g we observe is (say) 2, so defined that the probability 
of its being exceeded by chance is exactly equal to 0.5? It is easily 
--geen that generally 


2=60Vv4 | (13) 

and that, therefore, the corresponding estimate of @ would be (say) i 
if . 

Bo= & 2° (14) | 


It is seen that we are involved in a dispute without a backbone, 
Any suggestion to use g or g, or g» is a dogma, and anyone may choose : 
the one he likes and call it, "best." Until we have found a generally 
acceptable form of the problem of estimation, it is useless to insist 
that one or the other of the suggested estimates is the best, 


Attempting to reach an acceptable solution of the problem of 
estimation, we must bear in mind all the circumstances of the problem 
and its aims, The relevant points seem to be as follows, 


1° Any attempt to estimate a parameter 9 implies the desire 
(i) to make a statement concerning the value of @ and (ii) to avoid 
errors in this statement, 


2° Any statement concerning @ will have to depend on the values 
of some random variables provided by observation, Therefore, if the 
statement is made according .to some rule, it will be a function of the 
random variables and in consequence it will have the property of 
randomness; it will be subject to the law of great numbers. This means 
that, knowing the probability law of the xj, we may attempt to calculate 
probabilities of our statement having this or that property and in 
particular of its being correct. If it is found that the probability 
of a statement concerning @ is 0.99, when made according to a specified 
rule, then we shall know that, in the long run, the rule will actually 
lead to correct statements in about 99 percent of all cases applied, 


3° From this point of view, it is hopeless to look for a 


rf ae 
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solution of the problem of estimation in the form 
6 = some specified function of the x; = T(E), say. 


In fact, whatever the function T(E), if 6 can possess any . 
value out of a finite or infinite interval, the probability of T(E) 
being exactly equal to 6 must be zero, It follows that, in most cases, 
the problem of a unique estimate treated from the point. of view of 1° 
and 2° has no solution, But, as a matter of fact, the practical 
statistician must be aware of this circumstance and moreover, what he 
is’ actually doing suggests that he has already given up the ‘fee of a 
unique estimate, If you look through any number of recent statistical 


publications, you will find that the results of estimating means, 
Correlations, etc. are invariably given in the same form: T + Sm, 


where T is an estimate deduced from this or that principle and Sp the 


estimate of its standard error, ‘ This manner of writing and aiso the 
comments on the results sugzest that the practical statistician has in 
mind indicating an interval extending from T minus some more or less 
vaguely specified multiple ky of Sp to T plus some other multiple 


ko of Sp, in which "presumably" the true value of the estimated para- 
sasiabaaion is contained, 


This last circumstance is the main part of my point 3°, which 


‘.we have to remember when formulating the problem of estimation: the 


form of statement that it is desired to make concerning the.value of | 
Gis . 


@(E) < @< AB) (15) 


where a(z) and 6(E) are some functions of the Xi The familiar 
T - k1S- and T + keSp are only traditional forms of. these functions 
and it is difficult to say in advance whether and when they are satis- 


factory. 


4° If it were possible to define more than one pair of functions 
G(E) and @(£) both having the property that for purposes of estimating 
@ in the form (15) we shall be correct in a fixed and sufficiently large 
percentage of cases, then we could choose the one that conforms with 
our view on the accuracy of estimation.’ “Frequently, but not always, 
it will be the pair giving generally the narrowest intervals, i.e., the 
pair that makes the interval 


“@(E) - 6(#) (16) 
as narrow as possible. 


It will be noticed that the four above points differ essentially 
from the principles advanced as solutions, or rather as substitutes for 


-)240.- q 


a solution, of the problem af estimation, In fact none of the four 
points is doamatic.. 





The pointis 2° and 3° simply describe the situation and they do 
not contain any "you should do this or that." The other two points do 
contain something of that sort; namely, point 1° contains, "you should 

try to make erroneous statements as rarely as possible, "and point 4°, 
"you should try to make your statements concerning a as precise as 
possible" in’ the sense that you would prefer 


naan & ' (17) 


rather than 
1. <Ge <4 (18) 


but these "you should" are not dogmas. Whoever takes the trouble to 
make sone observations and to work them out mathematically, must have 
these two "you should" in mind, . Otherwise, he would probably offer an 
estimate of any parameter simply by opening a book of logarithms and 
reading the first figure that his eye would fall on, or the like, 


The first three of the above points lead to a mathematical problem 
as follows: 


Knowing the probability law p(£] 67, 4) of the x;, the problem of 
estimating Go is to determine two functions of the xj, namely 6(E) and 
O(E), satisfying the condition 


(Bz) < Oz) (19) 


and such that, if 6,° is the true value of the parameter 6,, then the 
probability of O(E) satisfying the inequality 


GE) < 44° (20) 
and at the same time of 9(E)} satisfying the inequality 
ay° < OE) (21) 


is identically equai to a-number a, close to unity, and chosen in 
advance, (This a is different from the @ on pp. 49-88) 


Point 4° indicates how to choose between the solutions of this 
problem if there is more than one, It must, however, be made more precise, 


This is the mathematical problem of statistical estimation as I 
understand it,and in my next conference I will give you some indications 
toward its solution, 


MR. PAGE: I have a very simple question; I didn't understand 


y= 24) = 


clearly the maximum likelihood method of estimating 0%, Could you 
take the square root of the estimate of o*? Does that give you the 
maximum likelihood of o itself? 


DR. NEYMAN: Yes. Say  T = max, likelihood estimate of. 6, 
; Then Tex Ty "t } aw Y Qe 
a7 " j 1" PY st De 


If I consider as my parameter, not @ but any function of 6, then the 
maximum likelihood. estimate will be the same function of T. This does 
not apply to the Scum estimate, 


MR. WALLIS: eponenys Fisher claim that maximum likelihood so= 
lutions will always be minimum-variance solutions also? I thought that 
Fisher claimed that he would get. the "best". estimate by the method of 
maximum likelihood, 


DR. NEYMAN: 1 am aware of these claims,* However, the proofs 
advanced by Professor Fisher to support them were not considered satis- 
factory by many mathematicians and recently several interesting papers 
have appeared on the subject.’ As a result, many of Fisher's statements, 
partly in a modified form and under certain limiting conditions, proved 
to be correct, I do not..remember whether the particular claim you 
mention was found correct or wrong, but I will quote here papers by 
Hotelling, Doob, Dugueé, and Pitman, where 4 are likely to ‘find the 
answer .** 


But my point is that the question whether the variance of the 
m.l. estimate is minimum or not is not relevant from the point of view 
of the goodness of the estimate itself, In’ the above example, the 
variance of g is smaller than that of g,, but does this circumstence 
prove the absolute superiority of g over g,? 


2 OK, Fisher, Messenger of Mathematics 41, 155- Moo, Joie. Phin, 
‘Trans. Royal Soc. 222A, 309-368, 1922; Proc. Cambridge Phil. Soc, ee, 
700-725,:.1925; Proc. Saal soc, 144A, 285-307, 1934; J. Royal Stat, Soc, 
US, 29-82, 1955, ; re: 

** Harold Hotelling, Trans, Amer, Math, Soc, 32, 847-859, 1930. 

Poti 00D, GDP SG, 799-775, 19t4, 38, 410-421, 1956; also Annals 

of Math, Stat. 6, 160-169, 1935, 
Daniel Dugue, Compte rendus 202, 193-195, 452-454, 1732-1734, 1936, 

E. J. G, Pitman, Proc. Cambridge Phil. Soc. 32, 567-579, 1936; ibid. 

33, 212-222, 1937, 
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\ Having reread the above draft of the conference, I find that it 
may suggest the idea that in my opinion all the previous work on the 
theory of estimation is more or less useless. I want to emphasize that 
this sugzestion’would be entirely wrong. All the attempts to treat 
the problem in this or that way were the necessary steps marking an 
advance and permitting further steps ahead, And probably still for a 
long time-all of us will calculate sometimes a best linear estimate 
and use the Markoff theorem, sometimes the m.1. estimate and se 
whenever circumstances permit, _the Bayes' formula, 


Apart fram this I must point out.a remarkable circumstance that 
we may frequently notice in many branches of sciénce, This is that the 
uncontrellable intuition of: the practical worker suggests to him the 
proper solution of his problem while he is entirely helpless in giving 
reasons why he is proceeding in this or in that way. If he is pressed 
for these reasons he frequently produces a principle that hes the 
appearance of proving something, but which actuelly proves nothing. 

‘To illustrate my point, I will mention.the familiar history of 
Heder iek tnvettadt lee they say, by country women long before any- 
thing like modern. serology hnd been started. . rn 


The role of:a rigorous séientiric theory is . sequently very 
modest and is reduced to explaining to the. practical men--and ‘this 
sometimes with certain difficulty--how good is what he’ knew himself to 
be good long ago 


In particular, the theory of estimation, of which I will speak 
in- more detail in the next. conference, shows the ‘t many of the familiar 
formulas like T + Spare the best “iets could be formed: proceeding in 
this wey, we get statements concerning the estimated parameters in 
their most exact form, and also we attain the relatively greetest free 
quency of correct statements, 


This is an illustration of a statement by Laplace which I like 
very much: "La théorie des probabilités ntest au fond que le bon sens~ 
_réduit au calcul; elle fait apprécier avec exactitude ce que les esprits 
_ justes sentent par une sorte d'instinct, sans qu'ils puissent souvent 
s'en rendre compte." After this quotation one might ask perhaps 
‘ whether theory is of any use at.all to the practical man, ‘I think it. 
is. He is occasionally in.doubt and then’a “theory is useful, Sometimes 
also his ineffable instinct is actual ly misleading, 
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AN OUTLINE OF THE THEORY OF CONFIDENCE . INTERVALS. 


oe conference with Dr. pte in the anegeart un of the Department 
of Agriculture, 9th April 1937, 10 a.m., Dr. Frederick V. Waugh, 
presiding, : : “4 


This morning I shall start with the problem of, estimation as we 
formulated it at the close of yesterday's conference (page 140), this 
being the form that I consider the proper anieper It will be a mathe- 
matical problem, * vg 


Consider n random variables x,; X,, -.., Xn, dependent or in- 
dependent, the values of which we can observe, Those n observations 
will determine a point E (see page 16)...Suppose that the probability 
law of the sample E (i.e. of :x,, X,, ..., X,), though known, is written 
in terms of some two parameters: 6, and: @,- which are not known. ‘There 
may be other parameters also, but for.simplicity I shall consider that 
there are only two; when there are more, the situation» is-similar,: I 
want to use the sample to inake an cation tie of one of the parameters, 
say .@y: 


’ >« When I say thet I want to estimate 9,, .I mean that I want to have 

a way of calculating two functions of E, the sample point x,, X., ...) Xn- 
one function to be denoted by 6(%) and the other by: O(E).. Those will 

be called the lower andthe upper estimates** of 6,, I shall now de- 
scribe the properties that I want the functions 6 and @ to possess, 

You notice + jan Of all that, being functions of the sample. 

Xa PS Pe; s they are both random variables, and both will vary 

from one ae "to another as. the sample point x;, Ky, ..:; Xp, varies. 


Since they are random variables, I may consider the probabilities of 


@ and @ lying within or without any specified ranges. 


Let us denote by 6,° the value of the parameter 6, which in my 
particular problem happens to be true. I don't know what this value 
is but I denote it by 9,°. Now one,of the properties that I want the 
two thetas to: ahimeig is this: I want the probability!# 


p{al (B) <a? < ry B)| 6°, an} = a (e.g. 0.95 or 0.99) (1) 


* The reader will realize that the problem of avn biGE was touched 
upon earlier; see page 29. 

** These are conveniently read "theta lower" and "theta upper". O(E) 
and 6(£) will occasionally be abbreviated @ and @, the letter E being 
omitted for brevity. But they are nevertheless CMweGdns or by 1.6) OF 
the sample, 

" To be read "the probability that when 6,° and 9, are the true values 
of the parameters, @(E) is less than ¢,° and G(E) is greater than 6,° 
ig equal to a," 


* weston. 


In words, the interval 8, 3 is to overlap the true value of 6,° with 
frequency a, which I choose myself, as for instance, 0. 99, or “something 
similar, . ; ’ ; 

‘ a ; o 
If I succeed in finding the functions @ and @ satisfying (1), 
then I shall call the number a the confidence coefficient, and the. 
interval extending from @(E) to @(E) the confidence interval corre~ 


sponding to the sample point E, 


Now, I shall emphasize something concerning Eq.(1) which I pur- 
posely dararrede<I did not mention it clearly in order to have the 
opportunity of emphasis. You remember I said that 6, ° denotes the 
value of 6, that happens to be true, but I do not ies what it is. 

(if I knéw what the true value of a. is, then there would bé no question 
of estimating). On the left-hand side of equality, (1) I have only a 
symbol for 6,° and I can't put'there any number instead of 0,° because 
I don't know what this number is, Therefore, this equation really 
should be considered not merely as an equation but as an identity, it 

is to hold for all values of 6,, as any one of thém may happen to be 
the true one. But this is not all. You remember that we assumed that 
the elementary probability law p{z|6@,, @,) depends on two parameters 

6, and @,, both unknown, It follows that I desire the functions 

@ and @ to satisfy (1) also identically for all possible values of 92, 





At first ‘sight, this problem seems to be not an easy oné, and 
not of a usual kind, What we have in all sorts of books on probability 
are formulas giving the probability that certain functions of the 
sample, such as x, or x/s, or x/o, will fall below a certain number, 
or exceed a given number, when the properties of the probability law 
‘are all specified, In the present problem, the situation is more com- 
plicated; we require a probability, calculated from a probability law 
depending upon @; and @,, to have a specified value, whatever the 
values of 6, and 6,. The ultimate solution, however, is easily obtained, 


However, before going into the details of the solution I want 
to make its purpose entirely clear. Suppose for a moment that we have 
succeeded in calculating the functions 6(E) and G(E) satisfying (1) 
identically, and let us see how we could use them to produce a solution 
of the practical problem of estimating @;.: The practical statistician 
is ‘able to observe the x4, and he wishes to know how these should be 
“used” for making some statement concerning the value of 4,. 


We may advise him to perform the following three steps. which, 
VaEEEher, are equivalent to a Single random experiment: 


(£6) ato bei the values of the x,, called E, 


(ii) to calculate the corresponding values of the functions @(£) 
and 6(E) and 


(iii) to state that 6(E) < 6, < 0(£) 


2 aes... 


You will notice that in this statement he may be correct or he may be 
wrong. But, owing to the properties of the functions: a and @ as 
expressed by Eq.(1 ), the probability of his’ being correct will be 
equal to a {eig. 0 99). It follows that if the experiment is so 
arranged that the’ xs do follow the elementary probability law that 


— 


served for constructing the functions @ and @, then the empirical law 
of big numbers will guarantee that the practical statistician following 
the above advice will’ be correct in his pieboments concerning the value 


“3 of in 99 percent ‘of all cases, 


‘The situation may-be° sip anesiea ite that of a game of chance in 
which ‘the probability of winning: has a fixed value. In the case of 
roulette, ‘for example, the gamblér is allowed ’to bet in various ways, 
but ‘whatever he may choose, the probability of the bank winning thé 
game is fixed'in advance, to the greater or smaller advantage of the 
bank, The uncontrollable choice of the gambler of how to bet corre- 
| sponds ‘to the possibility of 9, having this or that value, The choice 

of the rule for calculating the functions @-and @ corresponds to fixing 
the rules of the game, assuring that the probability of the bank winning 
(= the : ‘probability of the ‘statistician making a correct statement ‘on 
‘the value of 6 9} is fixed in advance and suf fficiently large. The choice 
of the funetions o and @ is made according to the particular probability 
‘Taw p( (El 6., G,) which the xy are assured to follow, -and correspondingly 
the rules “Of the game of roulette are fixed under fo assumption that 
the ball stops: equally frequently at each of the sectors, The actual 
frequencies i successes in both of the "games" depend essentially on 
whether’ these assumptions’ are, for each one, quit i geena. y, ell cntistiod, 

MR . FRIEDMAN: Your statement: of probability that he will be. 
correct in 99 percent of the cases is ‘also equivalent to the statement, 
is it not’; that the probability is 99 out of 100 that ae kOe 
between the Linsts as cca by a and 6? 


DR. NEYMAN: © No.* This is fetter the point I tried to emphasize 
ty my first two lectures both in theoretical discussions:and in examples, 
@,° is not a random variable, It is an unknown constant, In conse- 
quence, if you consider - ‘the probability of -¢,° falling within any limits, 
‘this may be either zero or unity, according to whether the actual value 
of 6, happens to be outside of these limits or within. The position is 
paablis as in my. example on page 5. with. the 1000th figure .in the ex- 
. pansion of 7 = 3,14159,.,. which I denoted by X,000. . You may remember 
that if. any calculations led to the conclusion that the probability 
Br K2000 < 5} has, a.value, different from both zero and unity, then 
these calculations are either wrong or else they must refer to some 
other. theory of probability. different. from the one I am using. The . 


* See, hgwever, the editor's footnote on page 146, The point is that 
we must not speak of the probability of 6, lying within fixed limits, 
nor limits that are not random variables. Editor, 
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connection with, the empirical law of big numbers is to my mind a 
sufficient reason to deal essentially with the particular theory 

of probability that’ I have chosen. In consequence, in everything | 

i gaid, and ‘in what I am going'to say, there is no room for a proba- 
bility of 64. having this or that value or fulfilling any given. in- 
equality, wherein this probability is other than O or l. 


Referring to the "picturesque" way of speaking, according. to 
the conventions established in my first -two lectures, we may say that 
neither 6,° nor Xi9900 iS a random variable because their values do not 
depend upon any random experiment. However, we could consider random 
variables, say @, and x, having certain connections with @,° and 
X10005 namely, we’ could define a method of picking up at random any 
one of the ten digits 0, 1, 2, ..., 9 for a particular position (as 
e.g. the 1000th decimal) in this expansion of m, Any digit picked up 
could be equal to the 1000th decimal, X1000. Moreover, with regard to 
any one of the digits picked up, Say x, we could perhaps assert that 
PO < xe 5} = a or the like, ddpiedid bees on the behavior of the experiment, 
For instance, it would be possible, by exercising sufficient care, to 
arrange the experiment so that ‘the frequency of the numbers l, 2, 3, 4, 
and 5, approaches closely # the total number of draws (see page 18a). 


Similarly, we could consider a method of picking up at random 
probability laws p(E|@,, 6,), differing among themselves by the values 
of the parameters 6, and @,, one of ‘these values being 9, = 6,°. The 
specification of this experiment method would be equivalent to the 
definition of the probability law of .6, and 6,, from which we could 
calculate probabilities of their falling within any specified limits, 
In this way we should most certainly fall back on the calculations 
eonnected with Bayest theorem (page 128). As I said yesterday, this 
theory is faultless by itself,: but its applications are rare, because 
it is unusual for the probability laws of 6, and @, to be known, In 
fact, we may assume, to a certain extent, the random character of the 
xj, Since the method of obtaining them is frequently under our control; 
but the variation of the values of the parameters 6, and 6, is usually 
beyond out control, Those are the reasons why the probability of 6(E) 
falling below 6,° and O(E) falling above 9,°, @(E) and O(E) being 
random variables, is not equivalent to the probabilities of 6,° falling 
within any assigned (fixed) limits,* 


* I think Dr, Neyman would agree that one ‘could speak of the probability 
of. i< Xio00, OF Of Xz900 < Jj, or both, if i and j are to be chosen at 
random according to some specified scheme of chance, The value of 

‘pli < X1600 < j} could range between 0 and 1, and would depend on the 

~ rules (ive. the experiment) by which i and j are to be drawn, Editor, 


1, wey = 


Now we shall go on to see how in general we could obtain 
functions such as @(E) and 6(E) satisfying (1). If I want to obtain 
the general way of determining the lower and upper estimates, @ and G, 
the thing” to do is to assume that I already have solved the problem 
and see what wae of proses ue bine those functions would be followed, 


: a ot Sidi I faled to you about the sample space, I shall 
repeat it now. ‘We denote by E the system OL wearer 255) eyo sy Sas 
which we can observe. E was described as a sample point in a space of 
re dimensions, #7 This sample space will be denoted by capital W. I 
“must say that W is not necessarily the wholé space but it is the set 
of different possible positions of sample points. If each x; is 

- normally distributed; or distributed on any continuous curve, the 
‘possible sample points form an n dimensional continuum, Or, it may be 
that the possible positions of E are limited -to certain Siseeete points 
of the space, as.will happen if each x; can take certain discrete 
values, but nothing between, Anyway, W will denote the set of all 
possible positions of E. a : 


Now I shall consider something which I shall denote by capital 
G, the general space. The points in this space will be denoted by g. 
They have n+l coordinates--the possible values: of x4, Xn, «++, Xn; 
and also the value of @,. Now if I fix the value of @, momentarily so 
that - 
@, = @,' (2) 


I shall obtain a section of the general space, a plane whose equation 
is 6, = @,', Any point g having coordinates x1, Xg, ..., Xn, and 44", 
will :lie on this plane no matter what be the values of X,, ..4, Xp. 

It is seen that there is a one to one ocrrespondence between’ the’ sample 
points E in W and the points g on any plane fixed in G by Eq.(2). 


However, if we take into consideration the sample space W on the 
one hand, and the whole general space G on the other, we shall find 
that to any point Ein the former corresponds a straight line, say L(£), 
in the latter, parallel to the’ @, axis, Along such a line L(E) in the 
general space, the parameter 6, takes’on all possible values proper to 
its nature, while the sample x,, Xz, ..., X, remains constant. The 
Situation is illustrated in the diagram on page 148, 


The lines L(E), each corresponding to a particular sample E£, 
are necessary for the geometrical interpretation of the functions 8(B) 
and GE), the method of calcilation for which we have assumed to be 
known, Take any sample E, find the corresponding line L(E) and; having 
‘calculated the values g (3) and O(E), plot them on the line L(E). The 
interval between the two points plotted will ‘be denoted by. §(E) and 
. called the. confidence interval corresponding ig the ho ied er sample E, 
—*. The cae may refer back to page-16. 
** Asain it is convenient to recall that a "point" is a set of a oy 
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A set of n observations E = he, Xo, wee, X, determines the line L(E). 
This line pierces the plane @, = 6,' at the point g. Note that point 

g lies within the region of ‘acceptance A(@,') and that the confidence 
interval’ 6(E), G(E) covers 6,'. The line L(E,) arises from some other 
sample, It pierces the plane 6, = 6,' at the point g,. Note that 

point g, lies outside the-region A(@,") and that the confidence interval 
6 (E,), O(E,) does not cover 6,", 


a" 


This can be done for all. samples, and for each one we shall have in the 
general space a line L(E), and on that line a confidence interval §(E) 
extending from the value of 9(E) to, the value of O(E). Every new sample 
gives a new line L(E) and a new pair of values for @(E) and @(E), and 
hence a new confidence interval, 


Now I shall define something that I shall call the region of 
acceptance corresponding to a given value of @,, Again, fix the value 
OF fy. et, say, 6,%; and consider the plane in the general space corre- 
sponti ne’ to this constant value of 6,--in other words, the. plane defined 
by Eq.(2). af I take any sample BE = x,, X,, ..., Xn, and. the corre- 
sponding confidence interval 8(E) lying. between 6(E) and 6(E) on the 
line L(E)., a shall find that the plane 6, = @,' does one of two- things 
-—-it cuts the. line L(E) either interior to the. confidence interval 5(E), 
or else exterior to it. If the former is the case,-it can be said that 
for this particular sample, @(E). and O(E).. satisfy tHe et ieee 


aber Stott ere ys WP le. WB). 


In other words, the point at which the plane @, = @,' cuts the line 

L(E), for this particular sample, lies between 0( 1 and O(E). For some 
Sthar sample the situation may be otherwise, as t figure on page 148 
illustrates, 


iE 
he 


Consider now all possible samples, and hence all possible lines 
such as L(E), each with its confidence interval 6(E) properly calculated, 
The plane 6, = 8,' will cut some of these lines at points interior to 
the confidence intervals, and the set of all such points will be denoted 
by A(6,'), to be called the region of acceptance corresponding to the 
melue iG.) OF G42.) To be precise, the set of points A(@,') does not 
necessarily form a continuous region in the plane 6, = 6,', bounded by 
a closed curve, but in most practical problems it is so, which justifies 
my terminology. Generally speaking, the region of acceptance ALO 
corresponding to 6,' is determined by the equation 6, = 6,' and the 
inequalities (3). | 





I shall have to introduce some more new conceptions and. notations, 
Take into consideration some particular value @,' of 6, - not necessarily 
the true one — and some particular sample point E. If it happens that 
6(E) and 6(E) calculated for this sample lies on opposite sides of 6,' 
so that inequality (3) is satisfied, then I shall-say that» the pas piaanee 
interval 8(E) corresponding to E covers the value @,",. This will be 
denoted by the symbol 


6(E) Cé,'- | (4) 
Again, if the coordinates of. the sample i detaunne a pest on 
the plane 6, = 6,' that belongs to the region of acceptance A(6, i 
then I shall say that the sample point E in the sample space W (of n 
dimensions) falls within the region of acceptance A(@,') corresponding 
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to the value 6," of 6,. This will be denoted by 
E € A(6@,") : 3 (5) 


With this notation, standing for "E is an yah of A(@,')," you are, 
of course familiar from my previous lectures and Gonrerencea: 


Now you will notice that the two events, denoted by (4) and (5) 
respectively, are identical: whenever the coutidece interval §(E) 
covers any particular value 6,' of 98,, then the corresponding sample 
point E must "fall within the region of acceptance Ald. *35% ama in- 
versely. In fact, (4) and (5) are two different ways of describing the 
same thing. Owing tothe circumstance that there exists a.one to one 
correspondence between the sample points in W and the points g on any 
fixed plane 6, = 6,', any region of acceptance A(6,) on such a plane 
will correspond to some particular. region in W, whith may also be -- 
denoted by A(6,'),; It follows that the event-(4) is equivalent to 
"E falls within the région A(@,') in the sample space W." . 


Until the present time we have not considered probabilities and 
have not made any assumption of what the true value of @, actually is. 
So now we shall consider some probabilities, 


If you notice that the two events (4) and (5) happen or fail 
together; you will agree that the probabilities of the two events are 
the same, whatever be the true values of 6, and @,. I may therefore 
write : 


P{s(E) Cc 6,'|6, ,@, } = P{E € A(@,')|6,,9 3 (6) 
This equation is to be read "the probability that 8(E) covers Ga ae 
identical with the probability that E is an element of A(0,'), whatever 
be the true values of 6, and 6,," . 
You will recall that we started with the assumption (Eq.1) that 
the two functions @ and @ are calculated by a rule that makes the proba- 
MLLIty (or, 8 .< @,° rae | equal to the confidence coefficient a, whatever 


be the true value 6,° of @, and whatever be the value of 6,. In other 
words we started with the assumption that 


P{O(E) < 64° < 6(E)[61°,4,} = a hag 
So now, with what we have just seen in (6), we may say that 
P{O(E) < 64° < B(E)|0,°,6,} = P{8(E) C 6,°|6,°,6,} 
= P{E € A(@,°)|0,°,@,} = @ (7) 


whatever be 6,° and whatever be 6,. 


a 
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im . Now what does this equation akg It says. that if 6,° be the 
true value of the parameter @,, then, whatever be the true value of 
6,, the probability of E falling within ‘the region of kia aga 
A(6,°) eee ee ae Lease, ? is equal to a 


“‘Wmat ‘i's. the oto aes ott? If the functions @ and @ satisfy the 
condition (1) that we wantéd, then the region of acceptance A(é6,°) 
corresponding to any possible value of 6, must have the property that 
_. the probability of the sample. point E pet deieae within this region, cal- 

‘culated under the assumption that @,°.is the true value of @,,.is 
independent, of the value. of 0» and equal. to ey Ori every plane. ie 

6, = const. ‘there will be a region of. acceptance, and all such regions 
satisfy the condition stated, 


. This is. one of Sent hécessary | conditions, say (i), which.the 
regions. of acceptance must satisfy if. the functions @(E) and @(E) do 
possess the property, stated by Eq.(1). There are a few others referring 
not to any particular region of acceptance, but the whole system of. them, 


tay i; (ii) Whatever ie he sample point E, “there must ‘exist at least 
one value .6," of Os such that & € Al@c! erg 


(Sai) tr 64! Pele anda Semple point & fella within both mreL*) 
and A(@,"), then it must fall within any region of acceptance A(6,™) 
. for waren Ox" << Og™< 84", 


(iv) Dt Gy tox 64", ‘and if the sample naan E falls thin any 
one of the regions of acceptance A(@,): corresponding to 64° <0, < 4,4", 
then it must fall-within both A(@,') “Bivie eneaens: also within all 
other regions.of acceptance Al 6; ) for which 6,'< 0,< 4,4". 


is have no hs to give you the proofs,* but if the functions 
(EB) and. 6(B) possess the property described in Eq.(1), then the regions 

of deceptance A(6,) must possess the properties (i) - (iv). It is easy 
to see that this result reduces the construction of.the upper and lower 
estimates of @, to the problem of determining on each of the planes 
6, = 6,' a region A(@ -¥) such that each of them separately possesses 
the property (i) and ieee system has the properties (ii) - (iv). 
Again I must leave this without proof. the 

However, I will indicate how the functions @ and 6 are defined, 
once the regions A(6,) satisfying (i) - (iv) have been determined. 
Take any sample point E. According to. (ii) above there will be at. 
least one value @,' of @, such that BE € A(@,'). Now take the maximum 
of the values of 6, for which E € A(@,) and denote it by @(E). Like- 
-wise the minimum value of 9, for which E € A(@,) will be denoted by 
OE). It is not difficult to see that 9(E) and O(E). thus defined do 
possess all the properties of the lower and upper estimates of, 4,.. 


— ow ew sew memes ee eee eee 


* For details sse my paper cited on page 28. 
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DR. DEMING: The system of those regions of acceptance will form 
some sort of tube, 


DR. NEYMAN: That is right. This tube may be more or less 
complicated. Now that the problem of confidence intervals is reduced 
to that of the regions of acceptance, we may look into the question 
whether this is easily solved, 


I have started with the assumption that the. elementary probability 
law ‘of the x; depends on two parameters 6, and 6,, of which we desire 
to estimate only the first. If there were only one unknown parameter 
6,, namely, the one to be estimated, then the only difference in our 
- discussion would-refer to condition (1) goa agegarcties it we, should 
drop the words "whatever be the value of 6,." It will be seen that in 
such a situation the solution of the sootladt is extremely easy and that 
‘there are millions of different systems of regions of acceptance. satis-’ 
fying (i) - (iv), and therefore many different functions @({E) < Q(B) 
possessing the properties of the lower and upper estimates, All we 
‘ want to do is to select in the sample space W, for any fixed. value of 
Os, say 6,', a region A(6,') satisfying the condition that the integral 
of p(E| 43! ) over this region is equal to a, and perhaps to shift these 
regions a little so as to satisfy the conditions (ii). - (ivty 


If however the probability law of the x4 depends not only on 04 

' put also on some other parameter 6,, (and perhaps even a third, or more) 
then the situation becomes more complicated, because of the. difficulty 
to satisfy property (i): identically, whatever the value of Oe may be, 
Here we come to the necessity of considering regions, called "similar 
to the sample space with regard to the parameter 6,." The.problem of 
such regions have been discussed* ** and in certain cases we know the 

‘ solution, In these cases, and they are sufficiently broad, we are in 
position to construct the confidence intervals for the parameter O46 
Further progress depends on that in the igioneiagi a similar regions, 


Let’ us now consider very briefly the question of the choice 
between all possible: ‘systems of confidence intervals corresponding to 
the same -confidence coefficient a, This choice is somewhat analogous 
to the choice between several games of chance, in which the proba- 
bilities of winning are all the same, but the sums to be won different. 
In the study of confidence intervals, however, the choice is a little 
more difficult, We should né turally. try to get confidence intervals 
as short as possible, and it may seem that the problem is that of 
finding the pair of pilictiene a and Oo for which the difference, 


is a minimum, This } PEO however, does not have any solution, 


* J. Neyman and E. S. ‘Pearson,’ ‘ footnote | on page 7%. 


** J. Neyman, foctnctes on page 121, 
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Bp and 6, are functions of. the sample ete and if we take into con- 
Sideration one semple in particular, ‘say E', then ‘it is’‘possible to 
find. the system of the upper and lower eavinete of a such that the 
difference between them, 


O(Et) ~ (Et) 


at_ that particular point E' is a minimum, But then the difference of 

the same functions calculated from other sample points will be enormous, 

This is a difficulty, and all we can do is to try to have the difference 
“OtRy = é(E) a minimum "in general." This could be defined starting with 


the following considerations, 


; Ps a. ° ‘denotes the true value of: Os, then, for any of the ~-possible 
systems of | Gonfidence intervals. we ‘must have * 


{5(E) a a3" | 34° , On ie as written in Eq.(7) 


- Suppose now that a is not the trud one, t 6,° ; OTn thts 
circumstance it is obviously useless-for ek, pe rae interval to 
cover @,'; on the contrary, we may consider it to be an:advantage if 
the confidence interval fails to cover @,', provided of course it does 
cover 6,° -~-the true value. If a confidence interval covering the true 
‘value of 9, also covers @,', which is not the true value, then we may 
say that this interval is "too broad," 


Starting from this came” we may Taepnine the "shortest" system 
of confidence intervals eotvasponding to the confidence coefficient @ 
as the ons. by which any value of. 6, that. is not the. true one is coperee 
with the smallest nie ait frequency; that is, ‘ 

P{5| (B) CY 164° 6,} = a minimum if 6,' # 8,° (8) 


: vans to shay janes i). this definition is immediately re- 
sen to the siete caens conditions cangerning | the ue er RE of acceptance: 


' ee a Al 8s) Liga Gist a thes (9) 
“pla € A(6 1) |62" 1s hs P(E € A(O,")[04', 8, } , (10) 
Md eiy asin on ae hoa any region A(e,) gateemene jit 

You probably recognize that in the theory of testing statistical 
“hypotheses we. have a similar though a little less complicated problem 


of the so-calléd uniformly most powerful tests. 


Unfortunately, the "shortest" system of confidence intervals 
does not-always exist, but-this is a situation familiar in mathematics, 


ui et 


We all know that frequently ratiohel numbers representing a square 

root of a given number do’ not ax ate. oo Le Le with real solutions 

of a quadratic, ete. If a particular problem has no solution, we 

have to formulate some other problem that would satisfy the practical 
statistician. So, whenever the "shortest" system of confidence intervals 
does not exist, we may look for’ the. so-called "short unbiased" system, 
which is defined as follows. 


Take the probability | 
P{8(#) ¢C é.*| 6s, iD alee . (11) 


of the confidence interval &(E) covering some particular value @,' of 
6,. This is, of course, a function of @,. If 9, = 6,' then, according 
to the general properties of the confidence intervals. particularly as 
expressed by Eq.(7), this probability will be equal to a; that is to say, 
the relative frequency of as covering @,', when 6,' happens to be 
the true value, wa be a, Now if the true value of 6, is shifted 
either way from @,', we shall require that the probability (11) should 
fall off as anes as possible, Altogether, then, we have these 
requirements: : 

Pisth) ca."(e. 6.7 4 ab | 


a PIS(h} COLT 8, 6,7 =O see Bree “a4 
dé, {8( a |, 2} - >» if @,=6,' (12) 








‘ : 
: P{5(#) C 6,'[6,,6,} = minimum 
ag, > : deithe et 


As a matter of fact, the minimum value of the second, derivative in (12) 
is always negative. Those conditions also are readily reduced to 
Similar ones referring to the regions of acceptance, 

If we manage to obtain a system of confidence intervals satis- 
fying the conditions (12), for any possible value 6.tof 64, then 
we shall call it short unbiased: The justification of the term 
unbiased is that if the conditions (12) are satisfied, and the second 
derivative is negative, then the true. value of 6, will always be 
covered by the confidence Se ei: more frequently than any other value. 


All Whose a are “tet very familiar and I shall probably have 
to refer to my publication quoted (footnote page 28). where you will find 
many details ane illustrations, 


QUESTION: Did you say that the 2d derivative in (12) is. always . 
negative, whether you choose the minimum value or not, whether you get 


a minimum or not? 


DR. NEYMAN: This minimum 24 derivative is always negative, 


— 
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QUESTION: But did you say all values of the 2d derivative will 
be negative? ua 

DR. NEYMAN: Whatever problem we consider, and whatever be the 
fixed value 6 ', we may find many systems of confidence intervals for 
which the first two conditions of (12) are satisfied; namely, 


.P{8(E) C 6,'|6,',6,} = a [as in the identity (9)] 
and 


P{5(E) C 6,'|4,, 0,5 =O for 6,= 6 ' [the second condition 
dex ties a. in (12)] 





But for those systems, the second derivative 





ee | 
- —=P{5(E) C 6,'°16,, 4, 
aa "18.5 Oe) 


may have various values, and some of them will be positive; but my 
point i6 that if we choose the system for which this 2d derivative is 
smaller: than for any. other system, then the minimum value of the 2d 
derivative will be negative, 


DR. -DEMING: . The question, if I might. restate it, is why aren't 
you satisfied merely. to have the 2d derivative pbeetioe? Why do you 
require it to be as small as possible? 


DR. NEYMAN: This has something to do with the other part of the. 
term used to describe the system satisfying the conditions in oh 
namely, "short .* 


Let Ee Rok who wtiat could be the graphs of: 
Piste) C.6,71 07, 6.} (11) 


considered as functions of 6,, corresponding to the two cases, when the 
second derivative in (12) is merely negative, and when it is 4 minimum, 
We shall get e picture like the‘one on the diagram on the next page. 
The two.curves have the same ordinate a for 6, = 8,', and both have a 
‘maximum at that point, but the curve corresponding to the minimum cf 
the second derivative will fall off quicker than the other, The smaller 
the 2d derivative, the steeper the maximum, In consequence, if the true 
“!alue. of @, heppens to be 6," # 6,' then 6,* will be less frequently 
“covered by the confidence intervals if the condition of the minimum 

2d derivative is satisfied. Of course, strictly speaking, this refers 

to values 6," in the vicinity of @,'. 


It may be useful to conclude this exposition by quoting a 
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practical example, Consider the case where it is known that all the 
observations X12, Xp, ooo, X, follow the same normal law and are mutu- 
ally independent. Then 


view fov(en1™ is i - thei | (as) 


where w and o are unknown, Suppose it. is heated to estimate ir, “Tre 
is just the case where the probability, Law depends on two unknown 
parameters and we have to deal with regions similar to the sample 
space with respect to o, 


P{S(E)'C 64'|@,,6,}° 








2a derivative 
merely negative — 


2d derivative 
a minimum 





The short unbiased system of confidence intervals is provided by 


>» | 
lt 
>] 


U(E) = x + tg s/v(n-1) + ta st/ vn 
he w (14) 
uw(E) = x = tg s/v(n-1) = x - ta s'/vn 
where X is the sample mean, s the S.D, of the sample defined by 
se = 23 (xy - x8 (15) 


as was used on page 134, s' is the estimate of o written on page 135; 
and tg may be taken directly from Fisher's tables, according to the 
number of degrees of freedom n-l1 and corresponding to his P = l-a, 


You see that in this particular Pesci it hall eevee of estimation 
is not dogmatic, and that it brings us to a solution of the problem 
equivalent to what is familiar, This is an, illustration of my statement 
at the end of the preceding conference to ths effect that the rédle of 
the theory is a satu | very modest compared to the PRoneey achievements 
of the practical man, 
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However, .it-is worth noting: that the traditional procedure of 
estimating o in the same: case (13) is less successful: the correspond- 
ing confidence: intervals aré biased-.and the frequency of .their covering 
some of the wrong.values of o is actually greater than that of covering 
the true oné,.* But even here the advantage of. the unbiased system of 
confidence. intervals is only a very slight one: what the uncontrollable 
instinct of the practical man has overlooked is relatively unimportant, 


MRS. KANTOR: Is there anything in English literature on confi- 
dence intervals? - 


DR. NEYMAN: It is a’relatively new subject. Apart. from my 
paper in the Phil.’ Trans, already cited on page 28, where I give the 
theory as ienk gademiaia Rae Ss the references are as follows,** 


1. J.sNeymant-. On the two different asvects of the representative 
method: J. Roy. Stat. Soc, 97, 558-625, 1934. See particularly 
‘pps 589-593, You will find here the description of the general 
idea in the simplest Case. - Formula (5) on p.565 (without 
faa te is an eile of my formulas (14) here. Cited on page 90, 


2. om J ieotier and E,:S. Pearson: The use of confidence, or 
' fiducial Limits illustrated in the case. of the binomial. 
Biometrika, 26 404-4155 1934. 


a A Maiheaewel ts haus -Neyman, and, Rie Supinska: - ‘Statistical studies 
in questions of bacteriology. Supplement Bid Roy, ctat. Soc, a, 
63-82, 1935. Here are given tables of the confidence intervals 
for Beng dea bead ga of living bacteria in a suspension, 


ty OE picks On. the i arcu of . confidence io oe Am, Math, 
Statistics, 6,;:111-116, 1935. -Here it is shown that in the 
Case when the xj are discontinuous, it may be impossible to 
satisfy the condition.(i) exactly, but:only in the form of an 
inequality P{8(E) C 0,°|@,°} > a. 


5. J. Prayborowski and H.:Wilenski: Statistical principles of 
routine work'in testing clover sead for dodder, Biometrika, 
27, 2757-292, 1935, Here the authors give the confidence 
intervals for.the unique: constant on which the Poisson distri- 
bution depends. Cited on page 27. id 


* A similar circumstance exists in the problem of testing hypotheses 
when standard deviations are involved, as was brought out’ by E.'S. Pearson 
in the discussion of a paper presented by R. A, Fisher at a meeting of 

the Royal Statistical Society in December 1934, See the J. Roy. Stat, 
Soe, 98, 39-82, 1935. uv! pages 66-67 in particular, Eddtor,. 


** The notion underlying confidence intervals seems to have been intro- 
duced by E, B, Wilson, J, Amer. Stat. Assoc, 22, 209-212, 1927. Editor. 
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6. Robert W. B, Jackson: Some problems of ‘testing statistical 
hypotheses and estimation relating to the question of the. 
relative accuracy of measurement, (To be published in the 
Statistical Research Memoirs, g, 1937), The author applies 
the general theory as sketched in these conferences to derive 
‘the’ confidence intervals pide elite: to various a i i al 
problems, 


“With respect to all the previous publications Concerning confi- 
dence intervals for which I am either totally or partly responsible, -I 
have to say that they contain a certain artificiality. which is now re=- 
moved: and of which the last: publication in the Philosophical Transactions 
1937 is free (footnote page 28), This artificiality consisted in assuming 
that the unknown parameter to be estimated is a random variable itself 
following an arbitrary probability law p(@). The arbitrariness of p(@) 
_ extended to the situation where it could reduce to unity just for :one 
particular value "6° of -G, being ‘Zero ‘elsewhere, in which case @ would 
be a constant, This circumstance: served as an’excuse, but. the mere 
. assumption of @ being a random variable does séem to be artificial. 


I should perhaps make a second remark concerning the connection 
between confidence intervals and the’ fiducial’ probability .theory of 
R. A. Fisher. In my paper of 1934 I stated, as’ I thought ‘then, that 
the two theories are essential lyetieienianet However .I made s6hie protest 
against the terms "fiducial probability" and "fiducial distribution” 
of the parameter 6, for which Fisher*. adopted the notation 


y do ='t(T]@) de > COR, 5 ie (16) 


T being some function of the Xj Later, however, shan siaeuecive my 
paper, Fisher insisted that *Piduéial betriaution is-a conception 
essential in his theory. Similar: statements regarding fiducial distri- 
butions of parameters you will find in his recent article in the 

Annals of Eugenics, ich with some erdtiedan of the nahin d of 
confidence intervals ,** 


DR. DEMING: There was also a publication by T. E. Sterne in the 
Proceedings of the National Academy, very much of the’ same thing. 

DR. NEYMAN: These statements of Fisher forced me to alter my 
opinion, and now I think that: the theory of confidence intervals and of 


ee Be Pishen: "Inverse probability! Proc. Cambridge Phil. Soc. . eee 
328-5355, 1950; page 534 in particular, 


wR A, Fisher, "The fiducial argument in statistical inference!" 
Annals of Eugenics, 6, 391+398, 1936. 


= T. EB, ‘Sterne, Pree; Nat'l Academy Sci. 20, 60s 603, 1934. 
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fiducial probability are two different things... 


DR. DEMING: Fisher uses his "fiducial distributions” to cal- 
_Culate fiducial limits, in other words, confidence limits. His 

' “fiducial distribution" is not used in differential form, but in. 

integrations, as I undérstand it. 


DR. NEYMAN: That is right, and the numerical results are the 
same. It is possible that they will be the same always, But the 
theories seem to differ. In the theory of confidence. intervals there 
is no room’for anything like the fiducial distribution of @ in the 
form (16). My impression is that the difference between the two 
theories is recognized also by Fisher: in his last publication in the 
Annals of Eugenics (footnote page 158) he warns the reader of some 
contradictions allegedly inherent in the theory of confidence intervals, 


DR. DEMING: Could not one very easily ‘dispose-of the novionsof a 
"fiducial distribution of 6" by simply pointing out that if in repeated 
sampling you. were to make up a frequency table of the various values of 
6 that occur in samples, all selected for a particular value or 
differential range of T, it would be found that the distribution of @ 
is what is commonly called the prior distribution of 6, and this may be 


and usually would be entirely unrelated to the riaucdat distribution 
(46)? 


DR. NEYMAN: I must admit that I am not able to follow Fisher's 
theory and therefore I cannot criticize, I know only that in the 
theory of confidence intervals there is no room for anything like dé. 


MR. WILLIAM C, SHELTON: In solving algebraic equations, for 
example, you may say vcu have an approximation, then you say you will 
improve upcn this approximation. One way of doing so is to use 
Newton's method of approximation, In that case we do differentiate 
with respect to a constant. We do it simply because we realize--we 
know it has been proved--that in order to find the value of the un- 
known, we may consider it as a variable and locate the root of the 
equation, We are not assuming that the unknown that we are trying to 
find actually varies, 


DR. NEYMAN: My impression is that the situation is somewhat 
different. Formerly I had thought something similar to what you say, 
and I said (in 1934) that "fiducial probability" and "fiducial distri- 
bution" are faulty terms, sort of suggesting conceptions that were not 
in the mind of their author, I find however that this is not so, 

For look what R. A, Fisher said himself when discussing my paper of 

1934 (p.618 loc. cit.) "Here, again, there might be serious difficulties 
in respect to the mutual consistency of the different inferences to be 
drawn; for, with a single parameter, it could be shown that all the 


inferences might be summarized in a single probability distribution for 
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that parameter, and that, for this reason, all were mutually consistent;,, 
I have underlined the passage that seems to be. relevant, and you will 
notice that even the qualifying "fiducial" has been left out. But I 
will repeat.again that I do not understand the fiducial theory and, in 
particular, I am not quite clear what R, A, Fisher had in mind when 
writing the above statements, or what are the possible inconsistencies 
he mentioned, alibi ie tie: ae 


DR.» DEMENG: This. second derivative being: inngseotsian not only 
negative but as small a negative number as possible, ae this not pes 
up directly with your idea of ‘trying to avoid "errors of the secone: 
Kinde 2a 


DR. NEYMAN: Both.in the theory of eatinintion and in that’ of 
testing; hypotheses we have many similar formulas and, in fact, many: ° 
‘Similar ideas, However, the conception of errors of the first. and‘. 
second kinds (p.45) is specific to the theory of. testing hypotheses 
and does not enter into.that of estimation, The reason is that as a”. 
result, of testing a statistical hypothesis, two kinds of action are 
possible: (1) reject the hypothesis, and (2) do not reject the, 
hypothesis, In conSequence there are two different ways in naan we. 
may be wrong, i.e.‘there are two ‘kinds of errors to be cautious ot, 

On the other hand, in the theory of estimation the result is. always 

of the same form: we say @ is this or that, satisfies this. “inequality - 
or some other, This statement may be correct or wrong, but there is. 
only one way of its being wrong: @ is not what it is stated to be. 
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Best unbiased estimate 131, 132 
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EH. BOREL 2, 14 
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sugar beets 
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LEO te 
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C. Chandra Sekar; listed under 
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Change of variable 17 
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Composite hypothesis 18 
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144 ff 
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‘Errors of the lst 


Confidence interval 29, 139, 
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unbiased system 153 

H. CRAMER 1 
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rejection) 45, 47, 121, 123 
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Daniel DUGUE 141 


statistics in, 114 
Sir Arthur EDDINGTON 78 
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Ripird cad method 109 ff 
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and 2d kind 
charts p.6l 
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133-136 
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Estimation, theory of, 28, 128 


conference 127-142 
Expansion of 1, consideration of 
the 1000th decimal 5, 15, 145 
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and Wilcox) 104, 105 

Farmer. and shoemaker 115 ff, 
charts 116, 118 

Fiducial. probability 158; fidu- 
cial distribution 159 

First kind, errors, of; “see, errors 
of the lst and 2d kind 

R, “At PUSERR: 28,749.71 56,. 573 
156 wel 5) 169." 188 
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Milten FRIEDMAN 104, 105 
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Half drill strips 53 ff, 61, 62, 
63 

N. A. HANSEN 64 

Harold HOTELLING 107, 119, 141 

Hypothesis, simple 18, composite 
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under Student (see also test- 
ing) * 
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OL. .1 297 1360 

Integral probability law 18, 16 

K. IWASZKTEWICZ 51, 60 


Robert W. B. JACKSON 158 


A, KOLMOGOROFF 2 
S. KOLODZIEJCZYK 51, 60 
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Level of fertility +57 

H.LEVY 32,7078 

Likelihood 132 ff 
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P. ©, MAHALANOBIS 79, 81 
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gation) 97 
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120 
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a2, 49, 507.64, Bey 57, 58, 
6a, 109 te Pf 

Te MATUSZEWSKI 24, es 157 

Maximum Likelihood 132 ff, 141 
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smooth test 58, with E. S. 
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Egon S. PEARSON 60, 75, 157 
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E, J. G, PITMAN 132, 141 
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POISSON law 24, 49 157 

Power of a test 47, 48 
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mind 130 y 

PRZYBOROWSKL 26,° 27,. 30, Sk, 1s? 

Purposive selection 89, 90 ae 
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22 

Random sampling 89, 90, 99, 100 
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eecond kind, errors of; see 
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W. A, SHEWHART 28, 78 
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Size of sample unit 107, 108 
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kind: 76 
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94, 102 
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63; Student*s test 36; 46, 75; 
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86 

P, V. SUKHATME 98, 102 

Miss J, SUPINSKA 24, 25, 197 
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Mrs. Y. TANG 675 74, 78, 79, 
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Unbiased test 48, 121, 123 
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Uniformity trials 55, 62 

J. V. USPENSKY 120 
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The sample space W, 16, lel, 147 
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B. L. WELCH ol 

G, A, WIEBE 56 

Sidney WILCOX 104, 105 

H. WILENSKI 26, 27, 157 
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